Open-EO / openeo-opensearch-client

Simple opensearch client for openeo.
Apache License 2.0
0 stars 0 forks source link

Remove duplicate products in creodias catalog client #1

Closed jdries closed 1 year ago

jdries commented 1 year ago

This creodias query returns two products that are actually the same:

https://finder.creodias.eu/resto/api/collections/Sentinel1/search.json?maxRecords=10&startDate=2021-04-11T00%3A00%3A00Z&completionDate=2021-04-11T23%3A59%3A59Z&productType=GRD&sensorMode=IW&geometry=POLYGON((5.785404630537803+51.033953432779526%2C5.787426293119076+51.021746940265956%2C5.803195261253003+51.018694814851074%2C5.803195261253003+51.02912208053834%2C5.785404630537803+51.033953432779526))&sortParam=startDate&sortOrder=descending&status=all&dataset=ESA-DATASET

S1B_IW_GRDH_1SDV_20210411T054146_20210411T054211_026415_032740_6184 S1B_IW_GRDH_1SDV_20210411T054146_20210411T054211_026415_032740_EBF8

If we forward this into openEO, we will process the same product twice. I propose to only return the most recent product, by publication date. To find duplicates, we cannot look at the product ID, because this is in fact different, so we need to look at all other properties.

Design wise, we could add this as a kind of postprocessing function that can optionally be applied to the results of a catalog search.

jdries commented 1 year ago

Current parsing of these search results happens here: https://github.com/Open-EO/openeo-opensearch-client/blob/master/src/main/scala/org/openeo/opensearch/OpenSearchResponses.scala#L280

jdries commented 1 year ago

Solving this issue will also help fix this one: https://github.com/Open-EO/openeo-geopyspark-driver/issues/229

EmileSonneveld commented 1 year ago

need to wait before tested in deployment before closing