SlideRuleEarth / sliderule-python

SlideRule Earth Example Notebooks: On-demand, cloud-based processing of satellite mission data (NASA ICESat-2, GEDI, ArcticDEM/REMA, HLS)
https://slideruleearth.io/rtd/
BSD 3-Clause "New" or "Revised" License
41 stars 21 forks source link

icesat2.h5 processing very slow #23

Closed jpswinski closed 3 years ago

jpswinski commented 3 years ago

When reading a dataset using the icesat2.h5 api, it takes a very long time to process the results that come back. This is because of the block of Python code that is used to put the pieces of the h5 response back together. (The h5 endpoint streams the results back in small fragments that then have to be reassembled before parsing).

https://github.com/ICESat2-SlideRule/sliderule-python/blob/480d1bfd4933588c3f1b3f4333650fddd4ef2f31/sliderule/icesat2.py#L523-L528

I did some performance analysis and saw that all of the time spent in icesat2.h5 is spent in the above section of code.

The icesat2.h5p executes a lot faster and can (and should) be used instead of icesat2.h5; but it would still be nice to resolve the performance bottleneck in the icesat2.h5 function.

jpswinski commented 3 years ago

Using bytearrays in above commit greatly improved performance.