Closed VictorVerhaert closed 1 month ago
Pushed a quick fix that circumvents the problem in the case of load_stac
, now looks like this on staging:
The problem occurs in regions where two features meet and SpaceTimeKeys
typically overlap both of these features.
In this case, the SpaceTimeKey
(purple) is fullyContained
within the bbox of the top feature (red) so only this GeoTiff asset will be taken into account and the bottom one discarded. Unfortunately the bbox does not match the actual footprint of the asset and the asset does not have data to fully cover the SpaceTimeKey: the gap.
@jdries is this optimization something that we want to have for load_stac
as well (I'm assuming yes)? The quick fix I did essentially bypasses it for load_stac
.
Otherwise, the real fix is twofold:
1) load_stac
should consider a STAC Item's geometry
rather than its bbox
; this should not be hard to implement.
2) the geometries in the STAC Items in this collection do not match their asset's actual footprint so they will have to be fixed (reingested?).
Fixing the footprints will consider both assets and therefore remove the gap:
I'm not sure if we need the optimization: most products are generated without any overlap. The huge amount of overlap applied to sentinel-2 is rather the exception. In addition to that, we do the optimization for sentinel-2, because it is such a commonly used collection. For load_stac, it is probably better to be on the safe side and load a bit more data.
I do believe that we should consider fixing the footprints, and also using the geometry rather than bbox should be a good idea in general.
@VictorVerhaert is it Stijn C. that is responsible for https://stac.openeo.vito.be/collections/tree_cover_density_2018 or who should I bother?
@bossie I made that collection myself. Can you clarify what is exactly wrong? Is it just the bbox that doesn't match?
@VictorVerhaert At least bbox
and geometry
, haven't checked proj:bbox
and proj:geometry
.
This item for example: https://stac.openeo.vito.be/collections/tree_cover_density_2018/items/TCD_2018_010m_E44N27_03035_v020
reports a geometry
of:
{
"type": "Polygon",
"coordinates": [
[
[
11.064548187608006,
47.38783029804821
],
[
11.064548187608006,
48.3083796083107
],
[
12.36948893966052,
48.3083796083107
],
[
12.36948893966052,
47.38783029804821
],
[
11.064548187608006,
47.38783029804821
]
]
]
}
whereas I would expect it to be something like:
{
"type": "Polygon",
"coordinates": [
[
[
11.046005504476401,
47.40858428037738
],
[
11.707867449704809,
47.40021736186508
],
[
12.36948893966052,
47.38783030409527
],
[
12.390240820693707,
47.837566260620925
],
[
12.411462626880093,
48.28720072607632
],
[
11.738134164531402,
48.29984134090657
],
[
11.064548187608006,
48.30837961418922
],
[
11.055172953154765,
47.85853023272656
],
[
11.046005504476401,
47.40858428037738
]
]
]
}
such that it matches the actual footprint of the GeoTiff asset.
Disabled the optimization in case of load_stac
(quick fix became real fix).
load_stac
will take a STAC Item's geometry
property into account as well (needs a recent openeo-opensearch-client
).
When using load stac on the following collection: https://stac.openeo.vito.be/collections/tree_cover_density_2018 (job_id: j-2405173083f249c2bcc9c07be6e65416) I get the following missing data:
From the load_stac api call STAC API GET https://stac.openeo.vito.be/search?limit=20&bbox=11.1427023295687%2C47.22033843316067%2C11.821519349155245%2C47.628952581107114&datetime=1970-01-01T00%3A00%3A00Z%2F2069-12-31T23%3A59%3A59.999000Z&collections=tree_cover_density_2018&fields= i fethed the two matching tiff files (given other color for clarity):![image](https://github.com/Open-EO/openeo-geopyspark-driver/assets/33786515/a3766f3d-9bc5-4122-b7e1-c36445cbad8a)
where you can see that the data from the red tile does exist but is not correctly loaded in. (the top line of the missing rectangle corresponds exactly to the dividing line of the two tiles.
used openeo code on CDSE: