Open kylebarron opened 3 years ago
Using float64 by default was an intentional choice because
raster:bands
didn't exist when I wrote everything a few months ago, so there was no way to know without actually fetching data what the native dtype of the asset would be. But we have to know that ahead of time to correctly construct the dask array. So float64 seemed like the safest default, since anything else could lose precision.rescale=True
by default, which uses the scale_offset
metadata defined within each GeoTIFF (not known within the STAC metadata) to apply rescaling. So even if the asset were uint16 to begin with, it could become float64 after applying rescaling—yet another reason why that default made sense.
However from what I've seen, nobody really sets the scale_offset
metadata at the GeoTIFF level, so I think this might be reasonable to remove. It would make thinking about dtypes a lot easier.
Note that you can control the dtype using the dtype=
parameter to stackstac.stack
. You'll also want to set rescale=False
if doing this, as noted in the docs.
I'd really like to make this automatic though. I think raster:bands
is the missing link to allow us to do that. Having data_type
, scale
, offset
, and nodata
in metadata really changes the game!
From looking at some examples, it appears that data is always loaded to float64 arrays. For example in https://github.com/gjoseph92/stackstac/blob/5f984b211993380955b5d3f9eba3f3e285f6952c/examples/show.ipynb, loading the RGB bands of a Sentinel 2 asset (
rgb = stack.sel(band=["B04", "B03", "B02"]).persist()
) creates an xarray dataset of type float64. It seems to me that you could improve performance (or at least memory usage) if you were able to use a smaller data type when possible.You could look at the raster:bands object if it exists to optimize the xarray data type. If the extension doesn't exist, or if the bands have mixed dtypes, then fall back to float64?