Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

filter by tile identifier #655

Closed jdries closed 8 months ago

jdries commented 8 months ago

Make the 'identifier' property work, allowing users to limit processing to a single product. Alternatively, we could immediately figure out the corresponding STAC property, to be forward compatible with a move to STAC?

Example code that doesn't work:

from openeo import collection_property
from openeo.processes import eq

bbox = get_bounding_box_from_polygon(wkt_polygon)
spatial_extent = {'west': bbox[1],
                   'east': bbox[3],
                   'south': bbox[0],
                   'north': bbox[2],
                   'crs': 4326
}

s2Image = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=spatial_extent,
    identifier = '8208c2c6-998e-4af3-a274-d286ae8df3da'
)

s2Image.download("sentinel2.tiff")
jdries commented 8 months ago

Both in STAC and resto opensearch, features have an 'id'. You do not really search by it, but it is possible to simply construct a url pointing to the feature with that id. For instance:

https://catalogue.dataspace.copernicus.eu/resto/collections/SENTINEL-2/35db5e84-f029-524e-8e78-a3d986cd675b.json

https://services.terrascope.be/stac/collections/urn:eop:VITO:TERRASCOPE_S2_TOC_V2/items/urn:eop:VITO:TERRASCOPE_S2_TOC_V2:S2A_20150706T105016_31UER_TOC_V200

Or STAC also supports search by multiple id's:

https://services.terrascope.be/stac/search?ids=urn:eop:VITO:TERRASCOPE_S2_TOC_V2:S2A_20150706T105016_31UDS_TOC_V200

CDSE also has search by productIdentifier, but you have to specify a full path: https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?productIdentifier=/eodata/Sentinel-2/MSI/L2A/2021/03/23/S2A_MSIL2A_20210323T104021_N9999_R008_T31UFS_20220927T033809

or finally, search by 'identifier': https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?identifier=35db5e84-f029-524e-8e78-a3d986cd675b

I'm inclined to support search by 'id' as this seems to be the most widely supported property.

jdries commented 8 months ago

@EmileSonneveld perhaps good to look into this one first, to have it rolled out in time?

EmileSonneveld commented 8 months ago

It is already possible to filter a specific product by productIdentifier. An example for 8208c2c6-998e-4af3-a274-d286ae8df3da:

s2Image = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=spatial_extent,
    properties=dict(
        productIdentifier=lambda
            x: x == "/eodata/Sentinel-2/MSI/L2A/2021/03/23/S2A_MSIL2A_20210323T104021_N9999_R008_T31UFS_20220927T033809",
    ),
)

To filter on ID explicitly, some developments need to be made

samYnsat commented 8 months ago

@EmileSonneveld thanks for your answer, but I am still not able to get that single product. This is my workflow:

1. STAC API to filter through products:

import requests
import pandas as pd

def get_bounding_box_from_polygon(wkt_polygon):
    coordinates = wkt_polygon.strip('POLYGON(())').split(',')
    lats, lons = zip(*[map(float, coord.strip().split()) for coord in coordinates])
    return min(lats), min(lons), max(lats), max(lons)

wkt_polygon = "POLYGON((4.220581 50.958859,4.521264 50.953236,4.545977 50.906064,4.541858 50.802029,4.489685 50.763825,4.23843 50.767734,4.192435 50.806369,4.189689 50.907363,4.220581 50.958859))"

bbox = get_bounding_box_from_polygon(wkt_polygon)
bbox_str = f"{bbox[0]},{bbox[1]},{bbox[2]},{bbox[3]}"

url = f"https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?processingLevel=S2MSI2A&startDate=2023-01-01T00:00:00Z&completionDate=2024-01-18T23:59:59Z&maxRecords=10&box={bbox_str}"

search_results = requests.get(url).json()

for feature in search_results['features']:
    id = feature['id']
    product = feature['properties']['productIdentifier']
    print(f"Found Scene {id}: {product}")

pd.DataFrame.from_dict(search_results['features'])`

Here I have some results, but I am interested in this one: Found Scene 8208c2c6-998e-4af3-a274-d286ae8df3da: /eodata/Sentinel-2/MSI/L2A/2023/04/20/S2B_MSIL2A_20230420T104619_N0509_R051_T31UES_20230420T122124.SAFE

2. OpenEO to download the product

pip install openeo

import openeo
from openeo import collection_property
from openeo.processes import eq

connection = openeo.connect("openeo.dataspace.copernicus.eu")
connection.authenticate_oidc_client_credentials(
    client_id='******************',
    client_secret='***********',
)

bbox = get_bounding_box_from_polygon(wkt_polygon)
spatial_extent = {'west': bbox[1],
                   'east': bbox[3],
                   'south': bbox[0],
                   'north': bbox[2],
}

s2Image = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=spatial_extent,
    bands=["B01", "B02", "B03", "B04"],
    properties=dict(
        productIdentifier=lambda
            x: x == "/eodata/Sentinel-2/MSI/L2A/2023/04/20/S2B_MSIL2A_20230420T104619_N0509_R051_T31UES_20230420T122124",
    ),
)

s2Image.download("sentinel.tiff")

I get this error: [400] NoDataAvailable: There is no data available for the given extents. Could not find data for your load_collection request with catalog ID "Sentinel2". The catalog query had correlation ID "r-2402062029bd40529d41194d3460eecd" and returned 0 results. (ref: r-2402062029bd40529d41194d3460eecd)

I tried to add at the end of the productIdentifier .SAFE to be exactly the same name as in STAC API, but I always have the same issue. I also checked the spatial_extend but it is exactly the same as in STAC API.

Can you see any flaws in this workflow?

Thanks, Sam

EmileSonneveld commented 8 months ago

Hi @samYnsat ,

For this specific product, the .SAFE extension is needed. (Some products have the extension some not)

I flipped x and y in your example and got a working result above Brussels:

spatial_extent = {'west': bbox[0],
                  'east': bbox[2],
                  'south': bbox[1],
                  'north': bbox[3],
                  }

s2Image = connection.load_collection(
    "SENTINEL2_L2A",
    spatial_extent=spatial_extent,
    bands=["B03"],  # one band for faster testing
    properties=dict(
        productIdentifier=lambda
            x: x == "/eodata/Sentinel-2/MSI/L2A/2023/04/20/S2B_MSIL2A_20230420T104619_N0509_R051_T31UES_20230420T122124.SAFE",
    ),
)

image

The query that was done by openEo: https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?box=4.3782975118437175%2C50.3956641543367%2C6.083471338196296%2C51.47211347201322&sortParam=startDate&sortOrder=ascending&page=1&maxRecords=100&status=ONLINE&dataset=ESA-DATASET&productIdentifier=/eodata/Sentinel-2/MSI/L2A/2023/04/20/S2B_MSIL2A_20230420T104619_N0509_R051_T31UES_20230420T122124&productType=L2A&startDate=1970-01-01T00%3A00%3A00Z&completionDate=2070-01-01T00%3A00%3A00Z (Note that when no temporal extent is specified, openEo will use 1970->2070)

You can ignore this warning: UserWarning: SENTINEL2_L2A property filtering with properties that are undefined in the collection metadata (summaries): productIdentifier. I'll remove it for a future release

clausmichele commented 8 months ago

Another parameter which would be important to document exposing the /queryables endpoint! https://github.com/Open-EO/openeo-geopyspark-driver/issues/536

EmileSonneveld commented 8 months ago

Hey @samYnsat,

Do the fixes from my previous message fix your issue?

If so, I can close this ticket!

Emile

EmileSonneveld commented 8 months ago

@clausmichele,

This is indeed the second parameter I had to add like this. On creo I had to add polarisation to SENTINEL1_GRD.

samYnsat commented 8 months ago

Hi, @EmileSonneveld Sorry, I had a rough day yesterday. I just checked it and it works. I can't believe that in the end it was the order of the array that was wrong. I thought I had checked it better. 😅 Thanks!