Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 15 forks source link

Unit in vector processes #330

Closed m-mohr closed 1 year ago

m-mohr commented 2 years ago

With mostly all vector processes we hit the point where you need to specify a length/distance/... and that needs to be specified in a specific unit. So we have seen different proposals across the PRs:

  1. Always meters
  2. Always degree
  3. Unit of the SRS (usually either meter or degree)

There are several issues with the options:

  1. What does "meters" mean specifically? e.g. https://github.com/Open-EO/openeo-processes/pull/314#discussion_r792528112 This is probably the most user-friendly option?
  2. What does "degree" mean specifically? e.g. https://github.com/Open-EO/openeo-processes/pull/315#discussion_r814799187
  3. Users need to explicitly understand the SRS, which is the least user-friendly option.

I think it's not a good idea to discuss this for every new process. So what's the preferred option that we adopt in general?

PS: In existing (raster) processes we have:

soxofaan commented 2 years ago

Given the tight timing and the fact that this issue is a blocker for other issues, a simple proposal to have an initial solution that allows to be improved in later stages:

It's not ideal because it implies lat-lon degrees for now, but at least it works, unblocks other issues and is future proof (I think)

In a later stage we can add support for a user-chosen unit in some way, e.g.:

mkadunc commented 2 years ago

From my experience the "default distance is the unit of CRS" just leads to a lot of errors made by users. I suggest making the unit parameter mandatory from the start, which will ensure that the users understand what the number they're inputting means.

If backends don't want to convert or if there's ambiguity in how the conversion happens, an error can be thrown in case the specified unit doesn't match the CRS unit.

edzer commented 2 years ago

I agree. And also: degree can never be a distance unit.

soxofaan commented 2 years ago

So if I understand correctly, the "unit = unit of CRS" approach is not a favorable option, even when it's just an initial solution?

I suggest making the unit parameter mandatory

Instead of a separate unit parameter, I would be more in favor of defining the distance as an object containing both the value and unit, e.g. like

    "distance":  {"value": 10, "unit": "m"},

We already use that approach in apply_neighborhood. I think it's schema-wise easier and more self-contained.

m-mohr commented 2 years ago

I'm not sure whether it's the not favorable option or whether @mkadunc just meant that the user should explicitly specify that he wants to specify the value in the unit of the CRS. So should users always explicitly specify a specific unit or is an option for "unit of srs" that the users explicitly selects fine, too?


On the other hand, in date_shift we have a value and a unit parameter separately ;-) The schemas are simpler with two parameters, but the simplicity of the schemas should not drive our decision too much.

mkadunc commented 2 years ago

"unit_of_crs" would also be acceptable, though not needed IMO (almost all CRSs will have either degree or meter as units, and it shouldn't be too difficult for the user to pick the correct one).

I prefer the syntax with an object — distance: {"value": 10, "unit": "m"}.

Re Edzer's opposition to having 'degree' as a distance unit — not sure what to do when CRS is geographic... Users will treat 'degrees' as just another unit, so it will be very hard to dissuade them, unless we come up with a name for this quantity that isn't "distance" (but e.g. "quasi_euclidean_2_norm_of_coordinate_difference" 😉 ). This is probably not a discussion for this issue, but something that should be addressed elsewhere.

m-mohr commented 1 year ago

Quick survey:

m-mohr commented 1 year ago

Vector processes only allow distances in meters.

e.g. in vector_buffer would change from "The distance of the buffer in the unit of the spatial reference system." to "The distance of the buffer in meters." and keep the parameter numerical.

Vote:

For GeoJSON this may need explicit or implicit reprojection, e.g. through load_geojson or vector_reproject -> #412

++
oooo
--
m-mohr commented 1 year ago

The decision from https://github.com/Open-EO/openeo-processes/issues/414#issuecomment-1492003851 (which we also based on this issue) also applies here. We are strict with the errors and throw an error if not meters. We add a chapter to the implementation guideline that back-ends can diverge from this and still implement more.