GlacioHack / geoutils

Analysis of georeferenced rasters, vectors and point clouds
https://geoutils.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
89 stars 19 forks source link

Add functionalities to read/save in zarr format? #506

Open adehecq opened 8 months ago

adehecq commented 8 months ago

This is a list of things that could be imported from our RAGMAC exercise code, mostly related to creating stack from multiple rasters and saving/reading to/from zarr format (taken from personal notes and saved here for the future). This might also be more straightfoward once we have the xarray accessor?

rhugonnet commented 8 months ago

This might also be more straightfoward once we have the xarray accessor?

Definitely!

For a time series stack, we had thought out recently that it would be better to have it in a separate package... But spatiotemporal netCDF stacks is also something that is being increasingly supported by recent effort, and that we could simply wrap mirroring our API like for other functionalities:

friedrichknuth commented 8 months ago

For reference, here a packaged version of the code developed during the RAGMAC exercise and relevant to this topic.

Of interest might be the create_stack.py utility and create_zarr_stack() functionality.

For now, the approach goes from GeoTIFFs to Zarr by reprojecting the data on to a common grid and writing out temporary NetCDF files. The final step is to write out the Zarr file with chunks that encompass the full time series for pixels within each spatially separated chunk.

There might be better approaches, such as those mentioned above. Using libraries like kerchunk to define and read/write optimally sized data chunks pulled from various formats that support byte range requests could also be worth revisiting.

adehecq commented 8 months ago

For a time series stack, we had thought out recently that it would be better to have it in a separate package...

Good point! I was just archiving my personal notes on RAGMAC and copied it but had forgotten about this decision. In any case, probably useful to have some interface with zarr, but not necessarily targeted towards data stack and direct to GTSA for such cases.