datacommonsorg / data

Apache License 2.0
57 stars 105 forks source link

Xarray Integration #1024

Open alxmrs opened 3 weeks ago

alxmrs commented 3 weeks ago

A lot of raster datasets useful for the data commons are openable with Xarray (https://xarray.dev, https://github.com/pydata/xarray). These include data in NetCDF, HDF5, Grib, Geotiff and most notably Zarr. With projects like Kerchunk and VirtualiZarr (https://github.com/zarr-developers/VirtualiZarr), more scientific datasets can (will?) be cloud-optimized, making Xarray a good single interface for dealing with raster data in a cloud-native way.

In the Earth observation and modeling space, nearly all (level 3+) datasets are compatible with Xarray. Meaning, this could be a good intersection point to ingest data from NASA, NOAA, ECMWF, ESA, NCAR, and so on. And, I personally am unfamiliar with the types of datasets in BioMed/Bioinformatics world, but Xarray is a critical intersection point there, too.