Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

ML Data Cube Regularization #444

Open PondiB opened 1 year ago

PondiB commented 1 year ago

Regularized datacubes are a necessity for machine learning and deep learning in EO time series data. This process aims to eliminate the need for a user chaining processes to have a consistent data cube

PondiB commented 1 year ago

@m-mohr , I am seeking your eyes whenever you get to have a moment as I have fixed most failures but I am taking way longer to trace this.

m-mohr commented 1 year ago

fyi: I won't get to it anytime soon, sorry.

PondiB commented 1 year ago

fyi: I won't get to it anytime soon, sorry.

Thanks for getting back. It's fine. I'll figure it out soon.

soxofaan commented 10 months ago

I'm not sure I understand why this process is necessary. The description talks about "irregular" but if your data is in a openEO data cube, then it's pretty regular already. Your time instants could be spaced unevenly, but that doesn't mean that an ML model could not handle that.

This process looks like a combination between aggregate_temporal_period and resample_spatial, but:

In this state, I think ml_regularize_data_cube is missing quite some parameters.

more generally: is there a compelling reason to define ml_regularize_data_cube, if we already have aggregate_temporal_period and resample_spatial?

jdries commented 10 months ago

The use case has even been explored quite extensively in openEO platform, and made it into public examples:

https://github.com/Open-EO/openeo-community-examples/blob/main/python/BasicSentinelMerge/sentinel_merge.ipynb https://github.com/openEOPlatform/openeo-classification/blob/main/src/openeo_classification/features.py#L117

PondiB commented 10 months ago

@soxofaan thanks for the feedback, on the OEMC project we are planning to come up with a new openEO backend with a more focus on ML and DL capabilities for Satellite Image Time Series.

Regular data cube in our case encompasses: (a) there is a unique field function; (b) the spatial support is georeferenced; (c) temporal continuity is assured; and (d) all spatiotemporal locations share the same set of attributes, and (e) there are no gaps or missing values in the spatiotemporal extent.

In our discussion, there were philosophies as shown in the image below and we would like to support both i.e. (1) allowing users to define their processes before ML/DL operations and (2)not bothering the users with underlying processes. Screenshot 2023-09-25 at 14 54 41

@jdries cool, I will check out the examples.

jdries commented 10 months ago

Nice, this is exactly what I happen to be working on for the moment, in support of a couple of projects using ML.

Maybe you already know, but openEO has a mechanism to build this kind of convenience function that is a combination of existing processes, the openEO 'user defined processes' (UDP). Using this has a couple of advantages:

I see this case arising more often, so maybe we can create an open source github repo, with the definitions of these UDP's. That would allow users to reference the central repo, or allow backends to import those definitions.

Now about the actual process:

(1) https://rdrr.io/cran/sits/man/sits_regularize.html

m-mohr commented 8 months ago

@PondiB I think it would make sense to make PRs against the ml branch because otherwise all changes from the ML branch will also appear in this PR. This leads to confusion. Please rebase your changes against the ML branch if necessary and set the base branch of the PR to ml.

PondiB commented 8 months ago

@PondiB I think it would make sense to make PRs against the ml branch because otherwise all changes from the ML branch will also appear in this PR. This leads to confusion. Please rebase your changes against the ML branch if necessary and set the base branch of the PR to ml.

Sure.