Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

How to run a UDF on vector cube to access geometry? #449

Open soxofaan opened 1 year ago

soxofaan commented 1 year ago

We're trying to support/implement a use case that depends heavily on vector cube manipulation, in particular running some custom (Python) UDF on the geometry data.

For example some (GeoJSON) feature collection manipulation use cases that are straightforward to implement the classic way, outside the openEO framework:

How should these be implemented with openEO processes? Note that with vector cubes, a lot of the relevant data to work with is in the labels (geometry data), which is quite different compared to how we typically work with raster data cubes

m-mohr commented 1 year ago

It would be good to discuss this based on specific use cases.

For label filtering there's filter_labels, maybe apply_dimension works, but other processes might be needed. Fallback could be run_udf at the top level, but it doesn't imply a good chunking strategy.

jdries commented 1 year ago

To modify the 'bands' (I'm assuming properties are stored in the bands dimension') we typically use apply_dimension(cube,dimension='bands',process=my_callback)

my_callback then gets a labeled array, where the labels are band/property names What we lack however, are the 'coordinates' of the labeled array in the other dimensions. Perhaps we can specify how these can be passed into the context object?

m-mohr commented 10 months ago

I don't quite get the question @jdries:

See https://openeo.org/documentation/1.0/datacubes.html#dimensions for details/examples.

So you don't need the context for this, you always have a numerical or string based label for these dimensions.

jdries commented 10 months ago

Yes, but the question is how to get to those labels from within a callback passed to apply_dimension:

apply_dimension(cube,dimension='bands',process=my_callback)

This passes on a labeled array to my process, where the labels of the array will be band names, but what if I in addition need to know the single geometry label? (So the WKT string, for that labeled array.)

m-mohr commented 10 months ago

You could get all WKT strings the array elements using array_labels: https://processes.openeo.org/#array_labels

jdries commented 10 months ago

I think that would give me an array of band names in the above example, whereas I'm rather looking for the single label of the 'geometry' dimension.

m-mohr commented 10 months ago

How does your data cube look like? It feels this issue is lacking context. Do you want the labels of the remaining dimensions? So if you run the process above for a geometry, bands dimension you want the label for the geometry and if you run over x,y,t you want the labels of x,y,t? Could you explain the use case a bit more? I assume the answer is "you can't do it" for apply_dimension right now. We could add a parameter to the callback that provides the information as an array.