ESIPFed / NUMfocusFallDev

1 stars 0 forks source link

cloud optimized netCDF and zarr #4

Open jhamman opened 6 years ago

jhamman commented 6 years ago

As part of the Pangeo project, we have been exploring the concept of "cloud optimized netCDF" - building off of "cloud optimized GeoTIFF". Zarr is an open-source Python library and storage spec "providing an implementation of chunked, compressed, N-dimensional arrays." The spec is simple, clearly documented, and well suited for use in cloud object store.

Last year, we (@rabernat, myself, and others from the xarray/dask/pangeo projects) wrote an experimental xarray backend for zarr and we have been testing its use on public clouds over the last year. The community is eager to see some formal effort put behind these concepts.

This proposal would do the following:

Other possible development objectives include:

per: https://twitter.com/rabernat/status/1039210134600396800

NumFOCUS project: Xarray ESIP member institution: NCAR

cc @mrocklin @rabernat @shoyer @alimanfoo @WardF

alimanfoo commented 6 years ago

Thanks @jhamman, I fully support this proposal.

Regarding other possible development objectives, it might be worth also linking to the work that @tjcrone, @shikharsg and @dazzag24 are doing to add support for Azure blob storage (https://github.com/zarr-developers/zarr/pull/293), and to the work @martindurant is doing on support for consolidated metadata (https://github.com/zarr-developers/zarr/pull/268). Along with https://github.com/zarr-developers/zarr/pull/252 these are all working towards extending and optimising support for cloud storage. Although we have or are close to working solutions across multiple cloud platforms, I think there is still work to be done to improve performance and robustness.

Also maybe worth mentioning as a possible development objective work towards implementations of the Zarr storage specification in other programming languages. @jakirkham has been reaching out and initiated a number of conversations, see https://github.com/zarr-developers/zarr/issues/291, https://github.com/zarr-developers/zarr/issues/289, https://github.com/zarr-developers/zarr/issues/286, https://github.com/zarr-developers/zarr/issues/285, https://github.com/zarr-developers/zarr/issues/284, https://github.com/zarr-developers/zarr/issues/279.

WardF commented 6 years ago

Tagging @dennisheimbigner. We are looking at these issues (the intersection of the model/spec between netCDF and zarr) as we begin our work towards adding native zarr support to the core netCDF C library.

jhamman commented 6 years ago

@WardF and @DennisHeimbigner - any additional thoughts here?