gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
247 stars 49 forks source link

Use reprojected `geometry` instead of `bbox` to determine extent #5

Open gjoseph92 opened 3 years ago

gjoseph92 commented 3 years ago

When an asset or item doesn't have any proj: fields, we're falling back on the WGS84 bbox field on the Item to determine extent, for the purposes of calculating reasonable bounds for the output array: https://github.com/gjoseph92/stackstac/blob/b652a07f9b2ae27235aea4db4ef0f1f594fd8941/stackstac/prepare.py#L202-L215

Reprojecting a lat-lon bbox to the output CRS is less accurate that reprojecting the geometry, then taking its bounds in the output CRS—you're stacking envelopes on envelopes.

The only question is whether to add a dependency to deal with the GeoJSON, or just write something ourselves. I imagine the overhead of making a GEOS object with shapely/pygeos is a bit high when we just need a list of coordinates to pass to pyproj; could use geojson.utils.coords; but it's simple enough we should probably just implement it ourselves.

matthewhanson commented 3 years ago

I've been wondering about this and if you can just really use the bbox. Is it just to create the output shape?

The proj fields provide the geotransform and the shape of the array as it lives on disk. But the bbox is not guaranteed to be the same extent of the file, in fact it should not be unless the data fills the whole scene. The STAC geometry is the data footprint. In the case Landsat path-row it's really not that bad of an approximation since the bbox of the data is about the same as the whole file. But in the case of retiled data, such as Sentinel-2, you could have some tiles where the actual data is just a some sliver in the corner of the file. In this cases the bbox is completely separate than the file itself.