NCAR / geocat-examples

GeoCAT-examples provides a gallery of visualization examples demonstrating how to reproduce plots from NCL Applications scripts with packages in Python. It also includes some longer form examples demonstrating how to use functionality from various GeoCAT packages.
https://geocat-examples.readthedocs.io
Apache License 2.0
66 stars 42 forks source link

Updating Polyg_19 #216

Closed michaelavs closed 3 years ago

michaelavs commented 4 years ago

Updating Polyg_19 to create a faster running code

michaelavs commented 4 years ago

@hCraker I was wondering if you have any specific input on using shapefiles with Python? We are trying to reduce the amount of memory/time to run on polyg_19 as it causes our builds to fail for readthedocs. The code block causing issues is ` for shape in us.shapeRecords():

if shape.record[3] == 'Alaska':
    plotRegion(shape, axin1, [None, 100], puertoRico=False, waterBody=False)
elif shape.record[3] == 'Hawaii':
    plotRegion(shape, axin2, [-161, None], puertoRico=False, waterBody=False)
else:
    plotRegion(shape, ax1, [None, None], puertoRico=False, waterBody=False)

` Which is around line 197. I'm still pretty unsure on how to use shapefiles but noticed you had submitted shapefile_1, so if you have any ideas on how to make this block a bit more efficient, I am open to suggestions!

clyne commented 4 years ago

Hi Michaela,

The first step in solving any kind of performance problem like this is to try to narrow down the hotspot; the portion of the code that is taking the most time. The best way to do this is with a profiling tool of some sort. There may be better options, but cProfile is simple and may do the trick:

https://docs.python.org/2/library/profile.html https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script

cProfile will tell you how much time is spent in a function call chain. I would start by profiling each call to the function in the above loop. You may get lucky and find that one of the 'shapes' is an outlier. Failing that, you can start profiling individual calls within plotRegion to which particular segment of plotRegion is taking up most of the time. Once you've identified the bottleneck, then you can start thinking about how you might eliminate it. Perhaps it is inefficient code inside of plotRegion that could be restructured, or perhaps it is a function that plotRegion calls, and a different parameter to that function might change the behavior.

In addition to cProfile - or another profiler - simply putting timers around segments of code to see how long they take can also be illuminating. Hope this helps!

cc: @erogluorhan