CS-SI / eodag-cube

Data access for EODAG
Apache License 2.0
10 stars 0 forks source link

Generic driver broken for `S2_MSI_L2A` with recent processing baseline #55

Closed claytharrison closed 3 months ago

claytharrison commented 4 months ago

Describe the bug Calling get_data on recently-processed products from S2_MSI_L2A returns mask data rather than band data for any B**-labeled bands.

Since at least processing baseline 04.00, band masks in the QI_DATA folder for S2_MSI_L2A products have been stored as .jp2 files, rather than as .gml files as they were before. These filenames contain the same band string as the actual image data filenames (e.g. QI_DATA/MSK_DETFOO_B04.jp2 compared to IMG_DATA/T33UWP_20221128T095341_B04_10m.jp2).

They end up at the front of the glob result that get_data_address computes, and then get returned by that function instead of the path to the actual band data.

Code To Reproduce

from eodag.utils.logging import setup_logging
setup_logging(verbose=3)

from eodag import EODataAccessGateway

# with credentials set in ~/.config/eodag/eodag.yml
dag = EODataAccessGateway()
product_type = 'S2_MSI_L2A'
latmin, latmax = 48.1, 48.35
lonmin, lonmax = 16.1, 16.6
extent = {'lonmin': lonmin, 'latmin': latmin, 'lonmax': lonmax, 'latmax': latmax}
provider = "cop_dataspace"
products, _ = dag.search(
    productType=product_type,
    geom=extent,
    start='2023-04-09',
    end='2023-04-24',
    provider=provider
)

products = sorted([p for p in products ], key=lambda p: p.properties["title"][11:26])
product = products[0]
product.download()

print(product.driver.get_data_address(product, "B04")

Output

2024-04-25 13:32:18,502 eodag.config                     [DEBUG   ] (tid=140227231950656) Loading configuration from /home/charriso/micromamba/envs/intertwin/lib/python3.9/site-packages/eodag/resources/providers.yml
2024-04-25 13:32:19,275 eodag.config                     [INFO    ] (tid=140227231950656) Loading user configuration from: /home/charriso/.config/eodag/eodag.yml
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) usgs: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) aws_eos: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) meteoblue: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) hydroweb_next: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) wekeo: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,291 eodag.core                       [INFO    ] (tid=140227231950656) creodias_s3: provider needing auth for search has been pruned because no crendentials could be found
2024-04-25 13:32:19,294 eodag.core                       [DEBUG   ] (tid=140227231950656) Opening product types index in /home/charriso/.config/eodag/.index
2024-04-25 13:32:19,303 eodag.core                       [INFO    ] (tid=140227231950656) Locations configuration loaded from /home/charriso/.config/eodag/locations.yml
2024-04-25 13:32:19,369 eodag.core                       [INFO    ] (tid=140227231950656) Searching product type 'S2_MSI_L2A' on provider: cop_dataspace
2024-04-25 13:32:19,369 eodag.search.base                [DEBUG   ] (tid=140227231950656) Mapping eodag product type to provider product type
2024-04-25 13:32:19,369 eodag.search.base                [DEBUG   ] (tid=140227231950656) Getting provider product type definition parameters for S2_MSI_L2A
2024-04-25 13:32:19,369 eodag.search.qssearch            [DEBUG   ] (tid=140227231950656) Building the query string that will be used for search
2024-04-25 13:32:19,369 eodag.product.metadata_mapping   [DEBUG   ] (tid=140227231950656) Retrieving queryable metadata from metadata_mapping
2024-04-25 13:32:19,370 eodag.search.qssearch            [INFO    ] (tid=140227231950656) Sending search request: http://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?startDate=2023-04-09&completionDate=2023-04-24&geometry=POLYGON ((16.1000 48.1000, 16.1000 48.3500, 16.6000 48.3500, 16.6000 48.1000, 16.1000 48.1000))&productType=S2MSI2A&maxRecords=20&page=1&exactCount=1
2024-04-25 13:32:21,090 eodag.search.qssearch            [DEBUG   ] (tid=140227231950656) Adapting 12 plugin results to eodag product representation
2024-04-25 13:32:21,108 eodag.core                       [INFO    ] (tid=140227231950656) Found 12 result(s) on provider 'cop_dataspace'
2024-04-25 13:32:21,108 eodag.auth.keycloak              [DEBUG   ] (tid=140227231950656) fetching new access token
                                                                                 2024-04-25 13:32:21,478 eodag.download.base              [INFO    ] (tid=140227231950656) Download url: https://catalogue.dataspace.copernicus.eu/odata/v1/Products(da1a69f7-e64f-404e-89ff-bfe38927b606)/$value56: 0.00B [00:00, ?B/s]
2024-04-25 13:32:21,479 eodag.download.base              [INFO    ] (tid=140227231950656) Product already downloaded: /tmp/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056
                                                                                    2024-04-25 13:32:21,479 eodag.download.base              [INFO    ] (tid=140227231950656) Extraction cancelled, destination directory already exists and is not empty: /tmp/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056
S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 4860.14file/s]
2024-04-25 13:32:21,479 eodag.product                    [DEBUG   ] (tid=140227231950656) Product location updated from 'https://catalogue.dataspace.copernicus.eu/odata/v1/Products(da1a69f7-e64f-404e-89ff-bfe38927b606)/$value' to 'file:///tmp/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056.SAFE'
2024-04-25 13:32:21,479 eodag.product                    [INFO    ] (tid=140227231950656) Remote location of the product is still available through its 'remote_location' property: https://catalogue.dataspace.copernicus.eu/odata/v1/Products(da1a69f7-e64f-404e-89ff-bfe38927b606)/$value
/tmp/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056/S2A_MSIL2A_20230410T100031_N0509_R122_T33UXP_20230410T135056.SAFE/GRANULE/L2A_T33UXP_A040731_20230410T100027/QI_DATA/MSK_DETFOO_B04.jp2

Environment:

Additional context I haven't checked this with any other providers, and I'm not sure if the issue goes back further than PB04.00 or not. It's definitely fine in PB03.01. Additionally, the S2_MSI_L2A_COG product from Copernicus Data Space works just fine for the same recent dates, but I guess that gets handled totally differently on the backend.

sbrunato commented 3 months ago

@claytharrison and @npikall, you can now use a regex to get the appropriate band:

product.driver.get_data_address(product, r"^(?!.*MSK).*B04_10m.*$")

This is not available yet in a stable release, but through develop branch that you can install using:

pip install git+https://github.com/CS-SI/eodag-cube.git@develop