Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 15 forks source link

Implicit resampling for spatial dimensions in mask and merge_cubes + clarifications #405

Closed m-mohr closed 1 year ago

m-mohr commented 1 year ago

mask and merge_cubes: The spatial dimensions x and y can now be resampled implicitly instead of throwing an error. Also, clarify a couple of inconsistencies and conflicting descriptions. Depending on what the reader assumed as the truth, this might be breaking.

Implements #402 Fixes #379

LukeWeidenwalker commented 1 year ago

I understand that the reproducability issue with this change has already been discussed (and accepted) in the PSC. With that disclaimed, I still feel obligated to ask: what exactly is the principle by which it's okay for this process to violate reproducability? And is this really so difficult to fix? What would be wrong about a resampler parameter, that at least explicitly encodes what's going on as a sub-process-graph? If that's too much effort, resample_cube_spatial lists loads of resampling methods, couldn't we then at least pick one from those and commit merge_cubes in its current form to always use that (under the assumption that the resampling method used is the only source of ambiguity here)? Let me know if I'm missing something here!

Btw, if folks are totally over this discussion and this is deemed not worth the effort by everyone, just ignore this comment - the content of the PR looks fine to me!

m-mohr commented 1 year ago

@LukeWeidenwalker This has not been accepted by the PSC yet, we just discussed it but did not vote on it yet.

I'm not 100% sure why this is hurting reproducibility? What we propose here is to just implicitly call the resample process with default parameters if required. As such this should be as reproducible as an explicit call to the resample process. I guess it's not very explicit in the process description that this is meant to use the default parameters?

LukeWeidenwalker commented 1 year ago

I'm not 100% sure why this is hurting reproducibility?

Because without more info this doesn't specify which resampling method was used!

What we propose here is to just implicitly call the resample process with default parameters if required. As such this should be as reproducible as an explicit call to the resample process. I guess it's not very explicit in the process description that this is meant to use the default parameters?

Ah, yes, IMHO this should be explicit. And since the default parameters can change, I'd also hard-code what this default parameter is expected to be. E.g. "The implicit resampling is performed as a resample_cube_spatial with method="near"".

m-mohr commented 1 year ago

Okay, I'll make this more explicit. I'm not sure whether It's not a good idea to also mention the default values. I assume by default back-ends don't change the default and if they change it then due to the reason that they don't support it. This means they could not implement this functionality here either and need to change the description, too. So it's not reproducible anyway...