Open hrodmn opened 1 year ago
Thanks for reporting @hrodmn, and thanks for the nice explanation. I think there are 2 things in this issue:
This definitely needs to be fixed! Just densifying the bounding-box before reprojecting it should solve the issue you're facing here.
If stackstac were to read
proj:geometry
instead ofproj:bbox
to check for overlap between items and the targetbbox
we could avoid this edge case!
This is interesting, because not using geometry
to calculate overlap was a somewhat intentional choice. My reasoning was that the geometry
field didn't feel as precise as things like proj:bbox
or proj:transform
+ proj:shape
, since geometry
is always lat-lon, whereas the other fields are in the same projection as the data and also seem more strictly defined. My perception a couple years ago, when I wrote all this, was that geometry
was more for display purposes but might not always be pixel-level accurate.
Indeed, in some cases we can see STAC data in the wild where the geometry
field doesn't do densification either. For example, look at MODIS data (like mentioned in the stactools blog post) on Planetary Computer:
The blue path (geometry
) doesn't match the actual data. If stackstac just trusted the geometry
field, we'd get the same error you're mentioning here.
Now, that stactools post is really interesting though, because it includes reading the underlying GeoTIFF and clipping the geometry
vector to where there's actually raster data. This is super valuable, because it adds additional information to the metadata that you don't otherwise have, and would let us better filter out unnecessary items.
If that approach becomes widely used, that would be a very compelling reason for stackstac to filter on geometry
. Of course, it also has the downside that mis-defined geometries (like shown above) would then cause issues.
I could see even using a hacky heuristic, that if the geometry
contains >4 points, trust it since it provides additional information, otherwise just use proj:bbox
since it's slightly more accurate.
Anyway, what to do:
geometry
instead of bbox
: this is a bigger code change, introduces the potential for bad geometries to cause a bad experience, and only adds value over densification if providers are using geometry as a vector nodata
mask. I'm very into it if providers start doing that, but probably won't work on that right now.Actually, I realize that there's a much simpler argument for filtering by geometry
:
Say you're loading this MODIS data into EPSG:4326. Your AOI, in lat-lon, does not actually overlap with the footprint of the asset. But the bounding box of the asset, in lat-lon, includes a ton of empty space and does overlap with your AOI.
If we filter by "bounding boxes intersect" (current behavior), you'll still load this scene, and it will be all empty. If we switched to filtering by geometry intersection—whether that geometry is actually the geometry
field, or a polygon we generate ourselves from the densified proj:bbox
, etc.—we'd be able to drop this unnecessary item. That's pretty compelling.
Still don't think I'll do this immediately, but it does seem worth doing.
TLDR: Would it be possible to filter STAC items using
proj:geometry
instead ofproj:bbox
in prepare_items? This would make it possible to use more than four reprojected polygon coordinates to check overlap between the targetbbox
and the STAC items.I have a STAC collection where the items/assets are stored in 15 degree squares in epsg:4326. This might be a terrible storage scheme, but it creates a situation where data gets inadvertently dropped when loading it with
stackstac.stack
. I can make a reprex if it would be useful but for now I will just start with a description of the problem.My AOI is fairly small and is defined in epsg:5070. This AOI happens to straddle the 45th parallel, so it is on the border between two of my STAC items. When I load the data via
stackstac
, one of the items gets dropped presumably because, after reprojecting to epsg:5070, the itemproj:bbox
does not overlap the AOI'sbbox
.Here is a python script that illustrates the situation:
After reprojection from epsg:4326 to epsg:5070, the boundary between the STAC item geometries has shifted north of the 45th parallel!
If I reproject the aoi to epsg:4326, everything looks as expected!
@pjhartzell recently wrote an article about this exact problem and shared how the stactools package addresses it by densifying the STAC item geometry. I tried that and it seems to fix my issue. Now the STAC item geometries still track the 45th parallel after being reprojected.
If stackstac were to read
proj:geometry
instead ofproj:bbox
to check for overlap between items and the targetbbox
we could avoid this edge case!