gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
248 stars 49 forks source link

NetCDF Time Dimension #237

Closed pinkerltm closed 3 months ago

pinkerltm commented 11 months ago

Hello!

We tailored or own STAC Catalog / API with stac-tools (pgSTAC Backend, FastAPI) indexing publicy available precipitation data from here https://public.hub.geosphere.at/datahub/resources/spartacus-v2-1d-1km/filelisting/RR/SPARTACUS2-DAILY_RR_2022.nc

As you can see, there is one NetCDF File for each year, which holds 365 days, so the time dimension should hold 365 labels.

If we stack items from our catalog (for example two years) I`ll get an xarray with only one time slot for the whole file.

grafik

Can you give me advise which STAC-extensions we should focus on so that stackstac gets enough metadata to successful create that XArray?

clausmichele commented 10 months ago

Hi @pinkerltm, could you please share the full code you're using to reproduce the issue?

gjoseph92 commented 3 months ago

Hey @pinkerltm, sorry to take so long to get back to this. Unfortunately, this is just not a use-case stackstac (and STAC, in a sense) is really designed for right now.

A few issues:

  1. The STAC spec doesn't have a way to describe an Item as having multiple time labels (365, in your case). The closest you could get is start_datetime and end_datetime, but that's just a range—stackstac (or any other tool) wouldn't know that there are 365 specific dates in that rage, versus 4, or 4000. Maybe there's a STAC extension for this, but certainly nothing that stackstac supports.
  2. stackstac is built around the assumption that each Item is 2D, basically—it doesn't have a temporal component. I haven't tested it or thought it through much, but I'd be a little surprised if computing this even worked.
  3. stackstac isn't very performant with NetCDF data. I've never even tested reading it, I don't even know if it works.

Overall, my advice here is that stackstac isn't really the right tool. If you're trying to make some NetCDF data more "modern" and accessible via STAC, I'd look instead at converting it to zarr or something. You could look at some Planetary Computer datasets that are in zarr to get a sense for how they do it.