Integration over time - Githubissues

jdries commented 11 months ago

A use case requires us to sum a band over an irregular time dimension. To do this correctly, the number of days between observations needs to be taken into account.

The question here is if we require a new process, for convenience, or if we can define a process graph that solves this.

This is somewhat similar to: https://docs.xarray.dev/en/stable/generated/xarray.DataArray.integrate.html

I made an attempt to solve this with existing processes, but couldn't verify it yet because our backend doesn't support all the details yet. It also has the downside that it is hard to optimize for the backend: we apply a function over the whole time dimension, while we only need information about the next label:

    from openeo.processes import date_difference, array_labels, array_apply
    def weighted_dmp(labeled_array):
        dates = array_labels(labeled_array)
        def weighting(x,index,label):
            days = date_difference(label,dates[index+1])
            return x*days
        array_apply(labeled_array,weighting)

    weighted_dmp = dmp_cube.apply_dimension(dimension='t',process=weighted_dmp)
    weighted_dmp.reduce_dimension(dimension='t',reducer='sum')

clausmichele commented 11 months ago

Interesting use case. In a similar scenario, I did retrieve first the temporal labels and then compute the date difference client side, since I didn't know how to do everything using openEO processes.

m-mohr commented 11 months ago

You could compute the differences only once and pass them in via the context into reduce_dimension, right? (probably not valid Python client code, but should visualize the idea good enough)

def weighting(x, index, label, context):
  return date_difference(x, context[index+1])

dates = dmp_cube.dimension_labels('t')
weights = array_apply(dates, weighting, context = dates)

def reducer(data, context):
  return sum(array_combine(data, context, 'multiply'))

weighted_dmp.reduce_dimension(dimension='t',reducer=reducer, context = weights)

Would this be faster/better?

Additional questions:

Should array_apply get a new callback parameter that provides the full array? Then you don't need to pass it through the context.
array_combine is a new process which I thought could be useful, but can be emulated via array_apply. It takes two arrays and merges them using a reducer that accepts two values, such as multiple or add.

array_combine is basically:
```
def combine(x, index, label, context):
return multiply(x, context[index])

combined = array_apply(array1, combine, context = array2)
```
Could alternatively also accept an array of arrays and then work with array functions, so sum instead of multiply.

Open-EO / openeo-processes

Integration over time #457