Open PondiB opened 1 year ago
@m-mohr , I am seeking your eyes whenever you get to have a moment as I have fixed most failures but I am taking way longer to trace this.
fyi: I won't get to it anytime soon, sorry.
fyi: I won't get to it anytime soon, sorry.
Thanks for getting back. It's fine. I'll figure it out soon.
I'm not sure I understand why this process is necessary. The description talks about "irregular" but if your data is in a openEO data cube, then it's pretty regular already. Your time instants could be spaced unevenly, but that doesn't mean that an ML model could not handle that.
This process looks like a combination between aggregate_temporal_period
and resample_spatial
, but:
aggregate_temporal_period
uses a different period specification formataggregate_temporal_period
has a reducer
argument which ml_regularize_data_cube is missing I guessresample_spatial
has projection
and method
arguments (and some more) which are also missing hereIn this state, I think ml_regularize_data_cube
is missing quite some parameters.
more generally: is there a compelling reason to define ml_regularize_data_cube
, if we already have aggregate_temporal_period
and resample_spatial
?
The use case has even been explored quite extensively in openEO platform, and made it into public examples:
https://github.com/Open-EO/openeo-community-examples/blob/main/python/BasicSentinelMerge/sentinel_merge.ipynb https://github.com/openEOPlatform/openeo-classification/blob/main/src/openeo_classification/features.py#L117
@soxofaan thanks for the feedback, on the OEMC project we are planning to come up with a new openEO backend with a more focus on ML and DL capabilities for Satellite Image Time Series.
Regular data cube in our case encompasses: (a) there is a unique field function; (b) the spatial support is georeferenced; (c) temporal continuity is assured; and (d) all spatiotemporal locations share the same set of attributes, and (e) there are no gaps or missing values in the spatiotemporal extent.
In our discussion, there were philosophies as shown in the image below and we would like to support both i.e. (1) allowing users to define their processes before ML/DL operations and (2)not bothering the users with underlying processes.
@jdries cool, I will check out the examples.
Nice, this is exactly what I happen to be working on for the moment, in support of a couple of projects using ML.
Maybe you already know, but openEO has a mechanism to build this kind of convenience function that is a combination of existing processes, the openEO 'user defined processes' (UDP). Using this has a couple of advantages:
I see this case arising more often, so maybe we can create an open source github repo, with the definitions of these UDP's. That would allow users to reference the central repo, or allow backends to import those definitions.
Now about the actual process:
@PondiB I think it would make sense to make PRs against the ml branch because otherwise all changes from the ML branch will also appear in this PR. This leads to confusion. Please rebase your changes against the ML branch if necessary and set the base branch of the PR to ml.
@PondiB I think it would make sense to make PRs against the ml branch because otherwise all changes from the ML branch will also appear in this PR. This leads to confusion. Please rebase your changes against the ML branch if necessary and set the base branch of the PR to ml.
Sure.
Regularized datacubes are a necessity for machine learning and deep learning in EO time series data. This process aims to eliminate the need for a user chaining processes to have a consistent data cube