Closed betolink closed 1 year ago
Wanted to link to your nice discussion on the topic of general cloud access to NASA data @betolink https://github.com/nsidc/earthaccess/discussions/251 :) We've discussed the utility of pulling out h5coro as a stand-alone tool, but it would likely be scoped to ICESat-2 only, rather than tackling all possible HDF files out there. But perhaps someone could run with it or extend over time...
@betolink at the prompting of @scottyhq and @tsutterley, we've created a pure Python implementation of H5Coro. It is still very early on in its development, but as of last week we opened up the git repo and are ready to start to let others take a look at it.
You can find the git repo at: https://github.com/ICESat2-SlideRule/h5coro
Alternately, you can install the python package h5coro
via pip or conda (from conda-forge).
As of right now, I've been able to use it to successfully read ICESat-2 ATL03 and ATL06 data, and GEDI L2 data. It is also showing a significant speed up over using s3fs, though it still isn't as fast as using SlideRule. There are still features that need to be added, and the interface needs some work to make it an easier drop-in for h5py... but those are all coming.
Let us know if you have any suggestions or find any issues. I'd be happy to continue the discussion here, or offline.
Future discussions on h5coro to take place within the repo at https://github.com/ICESat2-SlideRule/h5coro
I wonder if is it possible to extract H5Coro into a stand alone library and use it the same way we use
h5py
(understanding that the H5Coro doesn't support certain operations). My primary interest would be to accelerate the access of HDF files on S3.If we use the h5py library with S3 a very common access pattern would be something like this (with fsspec file-like objects)
Would it be eventually possible to use H5Coro as a drop-in replacement of h5py/S3FS for read only operations on S3?