gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
250 stars 49 forks source link

Use smaller internal data format when possible #63

Open kylebarron opened 3 years ago

kylebarron commented 3 years ago

From looking at some examples, it appears that data is always loaded to float64 arrays. For example in https://github.com/gjoseph92/stackstac/blob/5f984b211993380955b5d3f9eba3f3e285f6952c/examples/show.ipynb, loading the RGB bands of a Sentinel 2 asset (rgb = stack.sel(band=["B04", "B03", "B02"]).persist() ) creates an xarray dataset of type float64. It seems to me that you could improve performance (or at least memory usage) if you were able to use a smaller data type when possible.

You could look at the raster:bands object if it exists to optimize the xarray data type. If the extension doesn't exist, or if the bands have mixed dtypes, then fall back to float64?

gjoseph92 commented 3 years ago

Using float64 by default was an intentional choice because

Note that you can control the dtype using the dtype= parameter to stackstac.stack. You'll also want to set rescale=False if doing this, as noted in the docs.

I'd really like to make this automatic though. I think raster:bands is the missing link to allow us to do that. Having data_type, scale, offset, and nodata in metadata really changes the game!