Open-EO / openeo-processes-dask

Python implementations of many OpenEO processes, dask-friendly by default.
Apache License 2.0
19 stars 14 forks source link

bug in aggregate_temporal_period #159

Closed clausmichele closed 1 year ago

clausmichele commented 1 year ago

It seems that aggregate_temporal_period does not work as expected.

I attach a sampel code to reproduce the issue. The result using the openeo-process-dask implementation returns the same value for all 3 time steps. The manually computed instead show clearly three different values.

I really don't understand where this issue come from, @ValentinaHutter @GeraldIr could you please help me to understand what's wrong? I would need to fix it asap.

from openeo.local import LocalConnection
local_conn = LocalConnection("./")

url = "https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-2-l2a"
spatial_extent =  {"west": 11.296, "south": 46.459, "east": 11.297, "north": 46.4595}
temporal_extent = ["2023-06-01", "2023-08-31"]
bands = ["B04","B03","B02"]
properties = {"eo:cloud_cover": dict(lt=30)}
datacube = local_conn.load_stac(
    url=url,
    spatial_extent=spatial_extent,
    temporal_extent=temporal_extent,
    bands=bands,
    properties=properties,
)
aggregated = datacube.aggregate_temporal_period(period="month",reducer="median")

datacube_xr = datacube.execute().compute()
aggregated_xr = aggregated.execute().compute()

aggregated_manual = datacube_xr.resample({"time": "M"}).reduce(np.median)

print(aggregated_xr[:,0,0,0].values)
print(aggregated_manual[:,0,0,0].values)

[1239. 1239. 1239.]
[1239. 1309. 1312.]
clausmichele commented 1 year ago

I did a test. Replacing the reducer in aggregate.py:

https://github.com/Open-EO/openeo-processes-dask/blob/22a9c6e2c46bdec54cf2b66ffa3cbca94cfa22b4/openeo_processes_dask/process_implementations/cubes/aggregate.py#L106C1-L109C6

    return resampled_data.reduce(
        reducer, keep_attrs=True, positional_parameters=positional_parameters
    )

to

    from functools import partial
    import numpy as np
    reducer= partial(np.nanmedian)
    return resampled_data.reduce(
            reducer, keep_attrs=True
    )

which return the right result for the example above. So the issue is somewhere in the process mapping.

GeraldIr commented 1 year ago

@clausmichele After a lot of searching I've identified the bug as part of our openeo-pg-parser-networkx and how it caches results thus reusing the first datapoint it calculates. I've fixed in in the new release of that package and bumped the version in the relevant places on our end.

Make sure you have the newest version of that package (2023.8.2) and try again it should work then.

clausmichele commented 1 year ago

Thanks I will try it out immediately!

Edit: it works thanks!