Open-EO / openeo-gfmap

Generic framework for EO mapping applications building on openEO
Apache License 2.0
3 stars 0 forks source link

Test access to AgERA5 collection on CDSE through `load_stac` from Terrascope collection #90

Closed kvantricht closed 2 weeks ago

kvantricht commented 1 month ago
GriffinBabe commented 1 month ago

@kvantricht apparently the catalogue from AGERA5 still does not exists. Should we ignore that aspect of the pipeline for the moment and leave it to NO_DATA or should we work toward a workaround?

kvantricht commented 1 month ago

This is a bit annoying as it was put on the task list a long time ago and @soxofaan mentioned the issue was closed as solved? For your purpose, please proceed by setting Meteo stream as no data. Not sure how the model will respond to that so it needs to be solved. @soxofaan who can help building the catalogue to allow load_stac from Terrascope on CDSE?

cc @jdries

GriffinBabe commented 1 month ago

I think the confusion comes as there is a work-around in the OpenEO aggregator that allows the use of Agera5 data in terrascope in a CDSE job, as @jdries mentioned in the OpenEO-users channel. I would need more information on how to do that in the meantime that the catalogue is being built

kvantricht commented 1 month ago

OK focus on the workflow first without meteo until it's clear to copy over some example on how to load AgERA5 properly. Is there an action point for someone else to follow up on exactly that?

soxofaan commented 1 month ago

FYI the AGERA5 collection is available

it can already be used from there to create a proof of concept and unblock progress.

I'll cook up an example on how to combine AGERA5 with other CDSE data through the CDSE aggregator

soxofaan commented 1 month ago

FYI: here is an usage example purely on Terrasope, including merge_cubes of AgERA5 and Sentinel2: https://gist.github.com/soxofaan/17e379cd0577c9a01fc4d5e3ea5c5186

soxofaan commented 1 month ago

And here is usage example with doing a AgERA5 batch job on Terrascope and loading it in CDSE, to be merge_cubed there: https://gist.github.com/soxofaan/53e85ac771a6cce144f048f11956e620

kvantricht commented 1 month ago

Thanks @soxofaan ! @GriffinBabe is this sufficient to effortlessly include meteo in the crop/no-crop UDF?

soxofaan commented 1 month ago

I also tried to construct a use case using the crossbackend feature on CDSE aggregator (where the load_stac split is handled automatically), but that is currently blocked by the Terrascope backend not supporting CDSE OIDC tokens (https://github.com/eu-cdse/openeo-cdse-infra/issues/113)

GriffinBabe commented 1 month ago

Thanks @soxofaan, I can use your examples to build a first prototype. But for our application (to be delivered mid-june) we cannot use the Terrascope backend as we require global S1 & S2 collections. The solution of first creating a job in Terrascope to extract Meteo and then run another job in CDSE is suboptimal too as it will increase the failing points, will require the users to create multiple openEO accounts and is in general more work for everyone to maintain.

@soxofaan I think the best solution would be to wait for the implementation of a STAC catalogue of the AGERA5 collection directly accessible by the CDSE backend. Do you know how much time could that take? Is it a priority at the moment for the OpenEO team?

VictorVerhaert commented 1 month ago

Issue on creating the stac collection for agera5: https://github.com/Open-EO/openeo-geopyspark-driver/issues/591

soxofaan commented 1 month ago

The goal is to automatically run a AgERA5 batch job on Terrascope from the CDSE "aggregator". I'm actually working on that at the moment and something like this on the CDSE staging environment already works:

connection = openeo.connect(
    # Note: this is staging CDSE, requiring a CDSE staging account (which is different than the standard CDSE account)
    url="openeofed.dev.warsaw.openeo.dataspace.copernicus.eu",
)
connection.authenticate_oidc()

cube = connection.load_collection(
    collection_id = "AGERA5", ...
)

cube.execute_batch(out_format="netCDF")

Here, you connect to and interact with the CDSE (Federation aka aggregator), but the actual job runs in the background on Terrascope.

jdries commented 1 month ago

An initial STAC catalog with data for 2020 can be tested: https://radiantearth.github.io/stac-browser/#/external/stac.openeo.vito.be/collections/agera5_daily Would be good to have feedback before starting full ingest.

kvantricht commented 1 month ago

@GriffinBabe @VincentVerelst is this something we can check?

VincentVerelst commented 1 month ago

@jdries , it seems that providing any bounding box leads to an empty collection.

No bounding box does give results.

jdries commented 4 weeks ago

@VincentVerelst catalog was reset, again having only data for 2022; Now it does return items for the request you provided.