Open-EO / openeo-processes-dask

Python implementations of many OpenEO processes, dask-friendly by default.
Apache License 2.0
19 stars 14 forks source link

Inconsistent behavior of reducers "max", "median" and "mean" with respect to "nan" #181

Closed julianmeyerarnek closed 12 months ago

julianmeyerarnek commented 1 year ago

At current, the behavior of

For "median": ignore_data is set to True per default (https://github.com/Open-EO/openeo-processes-dask/blob/7ddf6c2aae7d8befa22d3746eacef16c83ba31b0/openeo_processes_dask/process_implementations/math.py#L122), whereas ignore_data for mean is set to False per default (https://github.com/Open-EO/openeo-processes-dask/blob/7ddf6c2aae7d8befa22d3746eacef16c83ba31b0/openeo_processes_dask/process_implementations/math.py#L129)

I suggest to have a consistent handling of nan for all reducers and would prefer the behavior of median.

And: I seems to me that I cannot "overrule" the pre-defined behavior - not even by providing kw-args.

An example Jupyter Notebook comparing mean and median-reducers can be found here: https://github.com/julianmeyerarnek/S5P_L3_O3_openEO

m-mohr commented 1 year ago

The default specified in the openEO processes is ignore_nodata=True, so all processes should default to True (i.e. the behavior of median)

clausmichele commented 1 year ago

Thanks @julianmeyerarnek to report this. Before that the fix gets merged https://github.com/Open-EO/openeo-processes-dask/pull/183, you can use this workaround to set ignore_nodata=True explicitly:

from openeo.processes import mean
def mean_reducer(data):
    return mean(data,ignore_nodata=True)
task_reduce_lon_mean = task_reduce_time_mean.reduce_dimension('x', reducer=mean_reducer)

which would produce the correct result: Untitled

@ValentinaHutter @GeraldIr it would be necessary to check why the tests didn't show this behaviour and add a specific one if it's missing

.