Open-EO / openeo-opensearch-client

Simple opensearch client for openeo.
Apache License 2.0
0 stars 0 forks source link

Filter out N0500 products in CreoFeatureCollection. #13

Closed EmileSonneveld closed 1 year ago

EmileSonneveld commented 1 year ago

GitHub says "All checks have failed" because the artifact was not uploaded trough Jenkins. But tests passed there.

jdries commented 1 year ago

yes it's important that we filter out only the products with this strange path: /eodata/Sentinel-2/MSI/L1C_N0500 not the full N0500 processing baseline. Do you happen to know why regular deduplication doesn't seem to work? (Or does it?)

jdries commented 1 year ago

I just spotted deduplication in action on these products, this time for L2A: "name":"org.openeo.opensearch.OpenSearchClient","levelname":"INFO","message":"Removing duplicated feature(s): '/eodata/Sentinel-2/MSI/L2A/2021/10/03/S2B_MSIL2A_20211003T093039_N0301_R136_T35VLC_20211003T124137.SAFE'. Keeping the Latest published one: '/eodata/Sentinel-2/MSI/L2A_N0500/2021/10/03/S2B_MSIL2A_20211003T093039_N0500_R136_T35VLC_20230108T012942.SAFE'\nRemoving duplicated feature(s): '/eodata/Sentinel-2/MSI/L2A/2021/10/26/S2B_MSIL2A_20211026T094029_N0301_R036_T35VLC_20211026T112534.SAFE'. Keeping the Latest published one: '/eodata/Sentinel-2/MSI/L2A_N0500/2021/10/26/S2B_MSIL2A_20211026T094029_N0500_R036_T35VLC_20230104T011328.SAFE'","created":1685108652.777403000,"filename":"OpenSearchResponses.scala","lineno":173,"user_id":"32cc7ddb-f7e1-4b1c-9796-f9fe39cb8feb","job_id":"j-b0bb044abbd948a0944141dd6c83b893"

EmileSonneveld commented 1 year ago

The L1C_N0500 products don't have organisationName specified, while the normal L1C ones have "ESA" in this field. This makes them not be considerated as duplicates. If I ignore the organisationName field in this case, the L1C_N0500 will survive the deduplication, because they are published years later.

EmileSonneveld commented 1 year ago

Maybe the N0500 products are the desired ones? They are published way later. I'll check kibana to see what errors they give

jdries commented 1 year ago

or perhaps log a helpdesk issue for cloudferro to explain these different processing baselines a bit better? Also this 99.99 is a weird case...

EmileSonneveld commented 1 year ago

For the moment, I consider 99.99 as an undefined baseline. But it is indeed not in the documentation. https://sentinels.copernicus.eu/web/sentinel/technical-guides/sentinel-2-msi/processing-baseline

The L1C_N0500 products all have "processingBaseline": 5.0,, not that much documentation for that one too: image

The others "processingBaseline": 2.09,. It looks like not all products have a version in the latest processingBaseline.

This is the error I saw in Kibana for j-840b5b2b2faf4fe4b3bf122c69d0852b:

Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/layercatalog.py", line 804, in _get_layer_catalog
    opensearch_metadata[cid] = opensearch_instance(os_endpoint).get_metadata(collection_id=os_cid)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/opensearch.py", line 174, in get_metadata
    collection = self._get_collection(collection_id)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/opensearch.py", line 169, in _get_collection
    resp.raise_for_status()
  File "/opt/openeo/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://finder.creodias.eu/oldresto/resto/collections.json
EmileSonneveld commented 1 year ago

While testing https://github.com/Open-EO/openeo-geotrellis-extensions/issues/164, it looks like ignoring organisationName is fine. If instrument and orbitNumber show to also have undefined values in some products, they could be ignored too. For the moment, I like to keep this check.

EmileSonneveld commented 1 year ago

With https://github.com/Open-EO/openeo-opensearch-client/pull/14 being merged, this PR is not longer needed.