Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

CRS handling of load_stac + filter_bbox? #753

Open soxofaan opened 2 months ago

soxofaan commented 2 months ago

reported by @llanduyt

I want to use a local dataset to mask my openeo datacube. To do so, I created a stac collection item and used load_stac. Everything seems to work, but when I specify my bounds in a different CRS the wrong data is returned. When I specify my bounds in a different CRS but first explicitly reproject using resample_spatial, the result is correct. Is this expected behavior? The source data (from Geopunt) are .tif with accompanying .tfw, could this be the cause of reproject not implicitly working? If resample_spatial is needed: Is this compuationally heavy? Aka is it better that I first reproject my source data?

This works as expected:

datacube = connection.load_stac(url="/data/users/Public/landuytl/geo-informed/Data/Boswijzer_2021/stac/Boswijzer_2021/collection.json") \
.resample_spatial(resolution=10, projection=crs_epsg, method="near") \
.filter_bbox(bbox=spatial_extent)

This doesn't:

datacube = connection.load_stac(url="/data/users/Public/landuytl/geo-informed/Data/Boswijzer_2021/stac/Boswijzer_2021/collection.json") \
.filter_bbox(bbox=spatial_extent)

Local data: /data/users/Public/landuytl/geo-informed/Data/Boswijzer_2021/GeoTIFF STAC collection: /data/users/Public/landuytl/geo-informed/Data/Boswijzer_2021/stac/Boswijzer_2021/collection.json

...

with

crs_epsg = 32631
aoi_bounds = [504000, 5645000, 518000, 5655000]
spatial_extent = dict(zip(["west", "south", "east", "north"], aoi_bounds))
spatial_extent["crs"] = crs_epsg

about the source crs:

Source CRS is 31370 (Belgian Lambert 72). Another particular thing is that the data are delivered as .tif with corresponding .tfw (World File). The latter comprises the image transform. Not sure if this could have an effect on the reprojection capabilities though as it works when explicitly reprojecting.