Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 16 forks source link

MultiPolygon support for mask_polygon #237

Closed soxofaan closed 3 years ago

soxofaan commented 3 years ago

https://github.com/Open-EO/openeo-processes/blob/75059c6566a7c8951d04467665252e74d5bb9c3a/mask_polygon.json#L20

A GeoJSON object containing a polygon. The provided feature types can be one of the following: A Polygon geometry, a GeometryCollection containing Polygons, a Feature with a Polygon geometry or a FeatureCollection containing Features with a Polygon geometry.

Why is MultiPolygon not allowed, but GeometryCollection is, while they are equivalent in the context of masking?

Related: aggregate_spatial describes behavior for points and lines as well, so it might make sense to also copy that to mask_polygon. I vaguely remember we discussed this before, but I can't reference right now.

As far as I could recover from git, the whitelist was there from the start (added in #20), but I can't find a reason for the excluded types.

m-mohr commented 3 years ago

MultiPolygon should indeed be allowed, I think. I'm actually also thinking, why not just make the process mask_spatial and allow all geometries as defined also in aggregate_spatial? The only reason is probably "breaking change"?

m-mohr commented 3 years ago

Probably something to do together with the vector data cube update for #68 as this would change the process anyway.

soxofaan commented 3 years ago

why not just make the process mask_spatial and allow all geometries as defined also in aggregate_spatial?

I'm not sure, for masking you also have mask for masking with a raster mask, which is also spatial, so mask_spatial would be confusing. It could make sense to go for mask_geometry though.

jdries commented 3 years ago

Don't really see why this moved from a very simple, backwards compatible, update to the spec, to a breaking change? Breaking changes really require a lot more motivation than what we have here. (The last rename of masking processes caused quite a bit of breakage, time loss and subsequent complaints from my users, hence the concern.)

soxofaan commented 3 years ago

yes indeed, I just wanted to raise the question about why certain geometry types are excluded for mask_polygon (in VITO backend we do support MultiPolygon for example).

The discussion about rename is something else and a lot lower priority for me

m-mohr commented 3 years ago

Strictly speaking, adding MultiPolygon is also a breaking change. It "breaks" interoperability with back-ends that do not update to follow to implement the new version with MultiPolygon. Nevertheless, it feels like we have somehow converged towards calling only user-facing changes breaking or so?

Anyway, I have tagged this as both "minor" and "breaking", which means we could consider the addition of Multipolygon in v1.1 and later - once we decide to do more breaking changes anyway - do the rename. The reason for the rename would be that mask_polygon is unintuitive if it allows all kinds of geometries and it would deprecate mask_polygon and add the new process.

m-mohr commented 3 years ago

At which point did we rename mask? I can't remember what kind of change that was...

soxofaan commented 3 years ago

103 #110: here we did the split between mask and mask_polygon (which was a single process previously)

m-mohr commented 3 years ago

Ah, I see... so there it was a change in an existing process, while here it would be a new process and the old one could still be available as a deprecated process for a certain amount of time. So I don't see a lot of things to break.

Another question is whether the integration of #68 would actually allow us to remove the special case of GeoJSON from processes and simply allow vector cubes (i.e. deprecate mask_polygon). They could be created with "get_vector" (whatever name it will have; see the python driver PR) and used in mask instead of directly passing GeoJSON to mask_polygon. But that is something that will arise from discussions around #68.

Anyway, we need to clarify whether such a change (i.e. adding MultiPolygon support) is considered to be a breaking change.

soxofaan commented 3 years ago

Adding MultiPolygon support to the mask_polygon spec doesn't seem like a breaking change. Very strictly speaking you could argue that it is indeed ("breaks interoperability"), but then every new feature or process is a breaking change, no?

m-mohr commented 3 years ago

I've opened PR #242, but before merging, I'd like to clarify what the direction is wrt such changes.

Adding new processes doesn't feel like breaking changes as there's no requirement to implement all processes. Also, this case is a bit different in that the change is just in a description and is not exposed in a machine-readable way through schemas (although we could add them...)

jdries commented 3 years ago

Interesting indeed to have the discussion about breaking changes. Let me try to illustrate what the impact is of both scenario's: Scenario 1: Process definition is extended to support more inputs Scenario 2: A new process, more broadly defined, process is created, old process is deprecated.

Scenario 1 impact:

Scenario 2 impact:

Let me know if I missed something, but based on this, I really prefer scenario 1 as it keeps the general level of annoyance in a community a bit lower.

m-mohr commented 3 years ago

The scenarios do not fully reflect the discussion above: Just for the multi polygons there was no name change proposed, see the corresponding PR. Scenario 2 was meant to also add support for all other geometries (line, point, ...), which would add to scenario 1 support channel cases where a user doesn't find the right process as it's called mask_polygon, while it may support points and lines.

Scenario 1 also doesn't fully go into the "change the back-end" use case. In scenario 2 it is easier to detect why a process is not running (process not available), while in scenario 1 you need to either go into details with the process specification to figure out changes from the description or the schema or run it to find out why it's not working on another back-end. That would also lead to support channel cases.

But this will be a case-by-case decision of the PSC anyway