Closed EmileSonneveld closed 1 year ago
yes it's important that we filter out only the products with this strange path: /eodata/Sentinel-2/MSI/L1C_N0500 not the full N0500 processing baseline. Do you happen to know why regular deduplication doesn't seem to work? (Or does it?)
I just spotted deduplication in action on these products, this time for L2A: "name":"org.openeo.opensearch.OpenSearchClient","levelname":"INFO","message":"Removing duplicated feature(s): '/eodata/Sentinel-2/MSI/L2A/2021/10/03/S2B_MSIL2A_20211003T093039_N0301_R136_T35VLC_20211003T124137.SAFE'. Keeping the Latest published one: '/eodata/Sentinel-2/MSI/L2A_N0500/2021/10/03/S2B_MSIL2A_20211003T093039_N0500_R136_T35VLC_20230108T012942.SAFE'\nRemoving duplicated feature(s): '/eodata/Sentinel-2/MSI/L2A/2021/10/26/S2B_MSIL2A_20211026T094029_N0301_R036_T35VLC_20211026T112534.SAFE'. Keeping the Latest published one: '/eodata/Sentinel-2/MSI/L2A_N0500/2021/10/26/S2B_MSIL2A_20211026T094029_N0500_R036_T35VLC_20230104T011328.SAFE'","created":1685108652.777403000,"filename":"OpenSearchResponses.scala","lineno":173,"user_id":"32cc7ddb-f7e1-4b1c-9796-f9fe39cb8feb","job_id":"j-b0bb044abbd948a0944141dd6c83b893"
The L1C_N0500 products don't have organisationName
specified, while the normal L1C ones have "ESA" in this field. This makes them not be considerated as duplicates. If I ignore the organisationName
field in this case, the L1C_N0500 will survive the deduplication, because they are published
years later.
Maybe the N0500 products are the desired ones? They are published way later. I'll check kibana to see what errors they give
or perhaps log a helpdesk issue for cloudferro to explain these different processing baselines a bit better? Also this 99.99 is a weird case...
For the moment, I consider 99.99 as an undefined baseline. But it is indeed not in the documentation. https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/processing-baseline
The L1C_N0500
products all have "processingBaseline": 5.0,
, not that much documentation for that one too:
The others "processingBaseline": 2.09,
. It looks like not all products have a version in the latest processingBaseline.
This is the error I saw in Kibana for j-840b5b2b2faf4fe4b3bf122c69d0852b
:
Traceback (most recent call last):
File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 804, in _get_layer_catalog
opensearch_metadata[cid] = opensearch_instance(os_endpoint).get_metadata(collection_id=os_cid)
File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/opensearch.py", line 174, in get_metadata
collection = self._get_collection(collection_id)
File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/opensearch.py", line 169, in _get_collection
resp.raise_for_status()
File "/opt/openeo/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://finder.creodias.eu/oldresto/resto/collections.json
While testing https://github.com/Open-EO/openeo-geotrellis-extensions/issues/164, it looks like ignoring organisationName
is fine.
If instrument
and orbitNumber
show to also have undefined values in some products, they could be ignored too. For the moment, I like to keep this check.
With https://github.com/Open-EO/openeo-opensearch-client/pull/14 being merged, this PR is not longer needed.
GitHub says "All checks have failed" because the artifact was not uploaded trough Jenkins. But tests passed there.