gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
238 stars 49 forks source link

unexpected shape of tile set #167

Closed solomon-negusse closed 2 years ago

solomon-negusse commented 2 years ago

Hello! I have looked through #152 and opening this as that doesn't seem my issue. I'm reading a collection of 10 deg x 10 deg tile assets of a global dataset and all the assets are single band and have the same datetime so I'm expecting an output of shape (1, 1, y_merged, x_merged) size but what I'm getting is (number_of_tiles, number_of_tiles, y_merged, x_merged) and the band coordinates are the asset keys. Is this not a supported use case or can this be a bug? Thanks!

BTW, reading just one of these tiles gives the expected output and zonal statistics on it give correct results.

gjoseph92 commented 2 years ago

I think you might be interested in https://github.com/gjoseph92/stackstac/issues/66. stackstac will always load each item as a separate coordinate (so it'll always be (number_of_tiles, ??, y_merged, x_merged)), but it sounds like you want them flattened.

I'm curious why there are number_of_tiles bands though. That's unexpected. Is it possible to share a minimal reproducible example, or the STAC metadata you're passing into stackstac.stack?

For the flattening, if there aren't many tiles (sounds like there aren't), you could just pass the result to stackstac.mosaic. You may also want to try odc-stac, since that supports a groupby parameter that'll let you flatten all the inputs more efficiently than mosaic (it doesn't load STAC metadata into xarray coordinates though, which may matter to you). I'm planning to add something like this eventually to stackstac, but haven't had the time for it.

solomon-negusse commented 2 years ago

thanks much for pointing me in the right direction, @gjoseph92. My tile assets in the STAC catalog had unique keys and using data as the key for all of them resolved the issue of thenumber_of_tiles bands. I also ended up using odc-stac to help with groupby which sorted out the time dimension.