Open-EO / openeo-processes-dask

Python implementations of many OpenEO processes, dask-friendly by default.
Apache License 2.0
19 stars 14 forks source link

Generic implementations for load_collection and save_result #9

Closed LukeWeidenwalker closed 2 months ago

LukeWeidenwalker commented 1 year ago

This came out of a private chat with @clausmichele, we were talking about why load_collection isn't currently implemented in this repo.

The reason is basically this:

For these reasons I really don't see how we'd be able to write a generic implementation for either load_collection or save_result at the moment, and that is why they aren't currently included in this repo.

A simple workaround is to do the following wherever you execute the process (before obviously!):

def load_collection(*args, **kwargs):
    ...

def save_result(*args, **kwargs):
    ...

process_registry["load_collection"] = load_collection
process_registry["save_result"] = save_result

And define what these processes should do there.

LukeWeidenwalker commented 1 year ago

@SerRichard

clausmichele commented 1 year ago

Thanks for the clarification. Documenting how to implement a base load_collection and save_result could be already a good start! I will create a PR for this, since it is anyway needed the client side processing.

clausmichele commented 1 year ago

@LukeWeidenwalker opendatacube (datacube) is a dependency of this project. It is used in aggregate_spatial, which supposes that the data comes from opendatacube (for attribute geobox).

How do we address this? Having a different implementation of load_collection, not based on ODC, would make this process not usable.

Edit: similar issue with resample_cube_spatial which relies on the geobox attribute.

LukeWeidenwalker commented 1 year ago

@LukeWeidenwalker opendatacube (datacube) is a dependency of this project. It is used in aggregate_spatial, which supposes that the data comes from opendatacube (for attribute geobox).

How do we address this? Having a different implementation of load_collection, not based on ODC, would make this process not usable.

Edit: similar issue with resample_cube_spatial which relies on the geobox attribute.

Ah, that is a very good point, I hadn't realised that connection before!

I think we could use https://github.com/opendatacube/odc-geo explicitly to make sure that this geobox object exists on all datacubes (possibly enforcing this with _normalise_output_datacube). Bit hesitant because aggregate_spatial might yet change to not need this after all, but at least for resample_cube_spatial I've already done some shopping around other libraries and haven't found anything that achieves this spatial resampling in a way that is dask-friendly. What do you think?

clausmichele commented 2 months ago

Sample implementations are mentioned in the docs here: https://openeo.org/documentation/1.0/developers/backends/xarray.html#the-load-collection-and-save-result-process