Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 4 forks source link

allow load_stac using the alternate local href instead #308

Closed VictorVerhaert closed 3 months ago

VictorVerhaert commented 5 months ago

for this collection: https://stac.terrascope.be/collections/sentinel-2-l2a the backend does not support the authentication to use the regualar asset href's provided. An alternate local href is available for each asset which the backend should be able to use.

Discussion is needed to determine when to use the alternate href instead of the regular one. One simple aproach would be to check if the root link is stac.terrascope.be, if so check if an alternate local href is present.

JeroenVerstraelen commented 4 months ago

This is a blocking issue for Victor.

JeroenVerstraelen commented 4 months ago

s3 alternate links are most important. Perhaps we can use a config with links for which we should always use alternate link.

High priority: https://catalogue.dataspace.copernicus.eu/stac/collections/GLOBAL-MOSAICS

Low priority: https://stac.terrascope.be/collections/sentinel-2-l2a

https://radiantearth.github.io/stac-browser/#/?.language=en

EmileSonneveld commented 4 months ago

GeoTiffRasterSource uses the correct credentials for the https://services.terrascope.be domain, while GDALRasterSource does not. This is because CustomizableHttpRangeReader is not used for jp2 files. Also setURLStreamHandlerFactory is not used in GDALRasterSource. But it should be possible to pass credentials with https://gdal.org/user/virtual_file_systems.html#vsicurl If this is not straight forward, it is best to use the alternate link indeed

EmileSonneveld commented 4 months ago

The STAC catalog does refers to files not existing on disk. For example, this whole folder is empty: /data/MTDA/CGS_S2/CGS_S2_L2A/2024/06/15 (And all days earlier that month) The terrascope URL is gives a 404 in this case too. Even when the correct credentials are used.

https://services.terrascope.be/download/CGS_S2_L2A/2024/06/15/S2A_MSIL2A_20240615T104031_N0510_R008_T31UFS_20240615T181049/S2A_MSIL2A_20240615T104031_N0510_R008_T31UFS_20240615T181049.SAFE/GRANULE/L2A_T31UFS_A046909_20240615T104525/IMG_DATA/R20m/T31UFS_20240615T104031_B01_20m.jp2

EmileSonneveld commented 4 months ago

Script to test:

import openeo
url = "https://openeo.cloud/"

connection = openeo.connect(url).authenticate_oidc()
spatial_extent = {
    "east": -8.4,
    "north": 40.3,
    "south": 40.2,
    "west": -8.5
}

temporal_extent = ["2020-06-03", "2020-06-04"]

cube = connection.load_stac(
    "https://stac.terrascope.be/collections/sentinel-2-l2a",
    spatial_extent=spatial_extent,
    temporal_extent=temporal_extent,
)

job = cube.create_job()
job.start_and_wait()
job.get_results().download_files()
EmileSonneveld commented 3 months ago

This is deployed on https://openeo-dev.vito.be and https://openeo-staging.dataspace.copernicus.eu/ The catalog https://stac.terrascope.be/collections/sentinel-2-l2a works fine. Thanks to Stijns cleanup of deleted products too.

But GLOBAL-MOSAICS still had the bands issue ( https://github.com/Open-EO/openeo-geopyspark-driver/issues/762 )

cube = connection.load_stac(
    "https://catalogue.dataspace.copernicus.eu/stac/collections/GLOBAL-MOSAICS",
    spatial_extent={
        "east": -8.4,
        "north": 40.3,
        "south": 40.2,
        "west": -8.5
    },
    temporal_extent=["2022-07-01", "2022-07-04"],
)

OpenEO batch job failed: OpenEOApiException(status_code=400, code='Internal', message='No band assets found in items; a band asset requires an "eo:bands" property with a "name".', id='no-request')