Open earthpulse opened 5 months ago
@jdries @juansensio @jamesemwheeler Here is a more detailed specification of this use case:
As a user, I want to make use of the EuroCrops dataset in EOTDL, create a filtered subset (EOTDL functionality) and use openEO from within EOTDL to generate predictive features from S1 and S2 time series, then train a model in EOTDL, and use run inference with that model in CDSE.
Define the list of features that we want to compute for this task.
We can reuse the S1 and S2 pipelines from world cereal (features already validated).
Below I share an example on how we typically access custom STAC collections:
openeo-community-examples/python/LoadStac/load-stac-item-example.ipynb
The example provided in: https://github.com/earthpulse/eotdl/blob/main/tutorials/notebooks/forest-map.ipynb
Feels like a more natural approach and a workflow we could provide as well.
@juansensio could you clarify wheter you want openEO to acces the EuroCropsDataset or wheter we want to extract S1 and S2 data which match the spatio temporal bounds from the EuroCropsDataset?
I believe openEO would be better suited to:
1) select a region of interest 2) define a desired preprocessing methodology (save it as a process graph) 3) download the preprocessed data
4) Train the desired model on the data
4) combine the standardized preprocessing with the model to run inference\
@juansensio @Patrick1G any feedback on how best to steer this use-case?
Patrick knows more about the use case, but as far as I understand the EuroCrops dataset contain crop classes for parcel polygons, so the goal would be to pair it with additional variables derived from S1/S2 (for example yearly mean NDVI).
openEO should be used to get this variables through a feature engineering pipleine, so we can use them to train a model and then re-use the pipeline at inference time.
Here we can delegate the entire process to openEO, or rely on EOTDL to retrieve the geometries from the STAC catalog and pass them to openEO... I guess the second option is better since we do not need openEO to access the dataset in EOTDL directly (just pass the resulting STAC catalog with geometries).
@HansVRP @juansensio the use case is described in detail above: - lets follow those steps please
Next steps then:
Not quite sure how step#2 above should be done?: Eurocrops contains millions of parcel polygons, and to train a model we only need a subset, e.g. contrained to a country, selected crop types and random selection of n polygons within that selection. --- I don't tink openEO provides good functionality to do this, so it could be done in EOTDL with python libraries. As a first step, this could also be done offline.. To be discussed at next meeting..
okay already have a first version up on https://github.com/earthpulse/eotdl/tree/hv_openeoexample
Todo
@juansensio Does EOTDL has a dedicated cdse s3 storage which we can use to save the results into?
@Patrick1G @jdries
For S2 I used Best Available Pixel composites, which create St monthly composites with a minimum amount of clouds. Afterwards I calculated some typical features (percentiles) https://github.com/earthpulse/eotdl/blob/hv_openeoexample/tutorials/notebooks/openeo/generate_s3_UDP.py
For S1 I used a similar approach https://github.com/earthpulse/eotdl/blob/hv_openeoexample/tutorials/notebooks/openeo/generate_s1_UDP.py
Please let me know your thoughts
@HansVRP resources above are not accessible..
But its important to keep the EO science aspects in mind here: we need to generate feature/metrics at a high temporal interval, as this is the critical information for crop type prediction, so 5/7 or 10 day interval metrics, not monthly BAP composites. Therefore I would suggest to use a similar feature engineering approach as above in the S1metrics notebook: {min, mean, mx, stddev, Q25, Q50, Q75, Q90} and generate this for e.g. 10 day interval for the year of the Eurocrops dataset
@Patrick1G @jdries please review the current version.
Here I used weekly composites of which I calculate the P10, P25, P50, P75, P90 percentiles.
The statistics can easily be expanded if required. However for now I kept them more limited as I run the statistics across 10 S2 bands, and 2 S1 bands; thereby already resulting in a netCDF with 60 bands.
feature engineering for parcels in eurocrops (temporal aggregation on some indices, for example)