Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

Integration over time #457

Open jdries opened 11 months ago

jdries commented 11 months ago

A use case requires us to sum a band over an irregular time dimension. To do this correctly, the number of days between observations needs to be taken into account.

The question here is if we require a new process, for convenience, or if we can define a process graph that solves this.

This is somewhat similar to: https://docs.xarray.dev/en/stable/generated/xarray.DataArray.integrate.html

I made an attempt to solve this with existing processes, but couldn't verify it yet because our backend doesn't support all the details yet. It also has the downside that it is hard to optimize for the backend: we apply a function over the whole time dimension, while we only need information about the next label:

    from openeo.processes import date_difference, array_labels, array_apply
    def weighted_dmp(labeled_array):
        dates = array_labels(labeled_array)
        def weighting(x,index,label):
            days = date_difference(label,dates[index+1])
            return x*days
        array_apply(labeled_array,weighting)

    weighted_dmp = dmp_cube.apply_dimension(dimension='t',process=weighted_dmp)
    weighted_dmp.reduce_dimension(dimension='t',reducer='sum')
clausmichele commented 11 months ago

Interesting use case. In a similar scenario, I did retrieve first the temporal labels and then compute the date difference client side, since I didn't know how to do everything using openEO processes.

m-mohr commented 11 months ago

You could compute the differences only once and pass them in via the context into reduce_dimension, right? (probably not valid Python client code, but should visualize the idea good enough)

def weighting(x, index, label, context):
  return date_difference(x, context[index+1])

dates = dmp_cube.dimension_labels('t')
weights = array_apply(dates, weighting, context = dates)

def reducer(data, context):
  return sum(array_combine(data, context, 'multiply'))

weighted_dmp.reduce_dimension(dimension='t',reducer=reducer, context = weights)

Would this be faster/better?

Additional questions: