Closed LukeWeidenwalker closed 2 months ago
@SerRichard
Thanks for the clarification. Documenting how to implement a base load_collection and save_result could be already a good start! I will create a PR for this, since it is anyway needed the client side processing.
@LukeWeidenwalker opendatacube (datacube
) is a dependency of this project. It is used in aggregate_spatial
, which supposes that the data comes from opendatacube (for attribute geobox
).
How do we address this? Having a different implementation of load_collection
, not based on ODC, would make this process not usable.
Edit: similar issue with resample_cube_spatial
which relies on the geobox attribute.
@LukeWeidenwalker opendatacube (
datacube
) is a dependency of this project. It is used inaggregate_spatial
, which supposes that the data comes from opendatacube (for attributegeobox
).How do we address this? Having a different implementation of
load_collection
, not based on ODC, would make this process not usable.Edit: similar issue with
resample_cube_spatial
which relies on the geobox attribute.
Ah, that is a very good point, I hadn't realised that connection before!
I think we could use https://github.com/opendatacube/odc-geo explicitly to make sure that this geobox object exists on all datacubes (possibly enforcing this with _normalise_output_datacube
). Bit hesitant because aggregate_spatial
might yet change to not need this after all, but at least for resample_cube_spatial
I've already done some shopping around other libraries and haven't found anything that achieves this spatial resampling in a way that is dask-friendly. What do you think?
Sample implementations are mentioned in the docs here: https://openeo.org/documentation/1.0/developers/backends/xarray.html#the-load-collection-and-save-result-process
This came out of a private chat with @clausmichele, we were talking about why
load_collection
isn't currently implemented in this repo.The reason is basically this:
load_collection
id
parameter inload_collection
needs to be statically resolved within the backend. Previously this got resolved in openeo-odc in a terrible way. Now in our backend, we define a custom load_collection function (with aEODC_LOCAL_STAC_MAP
that maps collection_id to where we store it) and register it to the process_registry later. This doesn't belong into this repo, because it's backend specific and I really want to keep this repo free of configuration.load_collection
would take a STAC collection instead, this wouldn't be a problem and we could just build a generic way to load xarrays from STAC collections. This would make our lives easier, so we'll probably start discussing this at https://github.com/Open-EO/openeo-processes/issues/377load_stac_collection
in openeo-processes-dask, and define customload_collections
to do this mapping wherever the process graph is actually executed (our backend, clientside processing).save_result
For these reasons I really don't see how we'd be able to write a generic implementation for either
load_collection
orsave_result
at the moment, and that is why they aren't currently included in this repo.A simple workaround is to do the following wherever you execute the process (before obviously!):
And define what these processes should do there.