Closed EmileSonneveld closed 4 months ago
next to adaptations of the wildcards or properties filters, it would be maybe good that the load_collection process can also handle a list of given tiles. e.g. sometimes the AOI is between two UTMzones but still only two Sentinel-2 tileIDs would be needed. So instead of a tileId with a wildcard it would be better to just pass a list of the specific tile names.
Would be good if this also works on CDSE
20x20 processing box, for each we have maximum 4 tile ids (most of the time 1 or 2 tiles). Would be better if we can give it a list of tileids instead of a wildstar.
you can find an example for the 20x20km processing boxes with their corresponding Sentinel-2 tileID's here: https://git.vito.be/projects/NCA/repos/extentmapping/browse/src/extentmapping/resources/LAEA-20km_add-info.gpkg
I will implement this extra filter in the properties part of here: https://git.vito.be/projects/NCA/repos/extentmapping/browse/src/extentmapping/openeo/preprocessing.py#288
The key point here is to find an openEO compatible way to specify the property filter.
lambda p: array_contains(data=["26SKH","26SLH"],value = p)
Note that wildcard matching is not part of the spec, but we can do things like:
lambda p: text_begins(data=p,pattern = "26S")
This should work on Terrascope:
properties = {"tileId": lambda tile_id: array_contains(["31UES", "31UFS"], tile_id)}
data_cube = (connection
.load_collection("SENTINEL2_L2A", properties=properties)
.filter_bbox(west=4.4158740490713804, south=51.4204485519121945, east=4.4613941769140322, north=51.4639210615473885)
.filter_temporal(["2024-04-24", "2024-04-25"])
.filter_bands(["B04", "B03", "B02"])
.save_result("GTiff"))
data_cube.execute_batch()
Adding a bbox filter is recommended because filtering by tile ID happens client side.
There's still some technical debt to be addressed.
array_contains
process is translated to an {"eq": ["31UES", "31UFS"]}
criterion;"eq"
is then ditched (not passed on to Scala);["31UES", "31UFS"]
) behaves as an OR (tileId == "31UES" || tileId == "31UFS")This is all very implicit and a better way would be to retain the operator i.e. an "in"
.
Some notes re: the technical debt:
org.openeo.geotrellissentinelhub.PyramidFactory
does support operators in addition to "eq" i.e. "lte" to be able to support filtering by cloud cover (https://jira.vito.be/browse/EP-4102);datacube_seq
: metadata_properties: util.Map[String, util.Map[String, Any]]
with an operator as opposed to metadata_properties: util.Map[String, Any]
without;flatten_eqs
parameter:@mbuchhorn Your usecase should work now.
Because we can only use asterisk wildcards on metadata we can get too many tiles passing the filter. As Marcel says:
26S*H
will not only find the needed26SKH
and26SLH
tiles, but also the unneeded26SHx
We could implement the "?" wildcard to match a single character. For example:
26S?H
. Or maybe allow for more complex matching by using something like filter_labels on product metadata.Related: https://github.com/Open-EO/openeo-opensearch-client/issues/25