Open-EO / openeo.org

openeo.org landing page
https://openeo.org
Apache License 2.0
6 stars 16 forks source link

Vector data cubes (overview) #58

Open m-mohr opened 2 years ago

m-mohr commented 2 years ago

What we need to do to add vector data cubes in openEO:

Questions:

  1. Do we want to restrict geometries to one geometry type per vector dimension?
    • Tendency: No, allow mixtures
  2. Do we restrict to only Point, LineString, Polygon, and the Multi-variants (and thus exclude e.g. PolyhedralSurface)? We already discourage GEOMETRYCOLLECTION in several processes.
    • Tendency: Yes, restrict to the types mentioned above
  3. How do we handle null/empty geometries?
    • Tendency: Don't allow them / skip them during import
  4. Representation of "dimension labels" (in STAC: "values")?
    • In metadata: ID, WKT, or GeoJSON (see STAC data cube extension PR)
    • In processes: It is just a representation so we can do multiple things, e.g. allow users to choose between WKT and ID. Or we need to decide on one of them. We can't really use "1D vector cubes" as labels (unless we change it).
  5. How to handle units in processes? See https://github.com/Open-EO/openeo-processes/issues/330
  6. Define (and describe) generally how to convert vector data into a vector data cube: https://github.com/Open-EO/openeo-processes/issues/346#issuecomment-1073758539 There's a proposal from Brockmann, GeoJSON could be aligned with STAC (datetime in properties)
mkadunc commented 2 years ago
  1. Do we want to restrict geometries to one geometry type each per vector dimension?

I'm not a fan of restricting in the standard; maybe, if restricting would be required for easier implementation, we could add this info to the backend capabilities.

It would be useful to have metadata about types in a dimension for specific data cubes, though - i.e. if I load a vector cube, it would be good to know which geometry types to expect for the labels on the spatial dimension.

  1. Point, LineString, Polygon, and the Multi-variants...

I agree that we leave out PolyhedralSurface etc. (for now). GeometryCollection is borderline - some vector operations might return GC in which case we'll have to "normalize" the results to the higher-dimensional type (e.g. an intersection of two linestrings will most likely be a point, but could also be a linestring; if we support only one type, we'll have to represent all points as degenerate single-point linestrings).

Having looked at OGC EDR, it seems that support for XYZ and XYM / XYZM geometries would be useful.

  1. Representation of "dimension labels" (in STAC: "values")?

I'd say GeoJSON (the 'non-standard' one with CRS).

I don't think ID is necessary - if you strip geometry values from a vector cube, it becomes a non-vector data-cube IMO.

  • Do we need the actual geometries in callbacks?

I'd say yes - let's treat geometry labels same as any other labels (e.g. named bands).

  1. How do we handle processes that now require "raster-cubes"

Rename raster-cube to data-cube in the schema and replace everywhere. Then introduce raster-cube as a subclass, and use it instead of data-cube in processes that do special things with raster spatial dimensions (x,y).

  1. What name do we recommend for the vector dimension?

geometry seems better than vector. feature would also be an option, or reference-geometry

m-mohr commented 2 years ago

Thanks, @mkadunc. Interesting that several of your points are exactly contrary to what @edzer proposed to me before. I guess you can have some good discussions here while I'm on vacation. ;-)

It would be useful to have metadata about types in a dimension for specific data cubes, though

That's a pretty good idea indeed. I should add that to https://github.com/stac-extensions/datacube/pull/10

GeometryCollection is borderline

Right now we say in processes that

To maximize interoperability, a nested GeometryCollection should be avoided. Furthermore, a GeometryCollection composed of a single type of geometries should be avoided in favour of the corresponding multi-part type (e.g. MultiPolygon).

Not sure what backends actually do with this in implementation though.

I'd say GeoJSON (the 'non-standard' one with CRS).

Then it's not GeoJSON though. So you mean the real invalid one (I'd like to avoid that) or were you referring to this new JSON-FG from OGC? https://github.com/opengeospatial/ogc-feat-geo-json (I could see us using that, but it's WIP).

Rename raster-cube to data-cube in the schema and replace everywhere. Then introduce raster-cube as a subclass, and use it instead of data-cube in processes that do special things with raster spatial dimensions (x,y).

That's breaking and requires processes v2.0. I assume implementors will not be happy about it. (Also, in the schemas we don't really have subclasses except from subclassing native types).

edzer commented 2 years ago

geometry seems better than vector. feature would also be an option, or reference-geometry

I also like geometry, or alternatively feature_geometry. In SFA a feature is a thing that has a geometry and other attributes.

I think I'm also in favour of a GeoJSON that does not restrict to EPSG:4326. Although that is a (IETF) standard, it's clearly out of date and not good enough for today's requirements. But the individual feature geometries must then each come with a CRS, right? Or will the CRS be a property of the metadata for the dimension as a whole?

m-mohr commented 2 years ago

Discussed with @edzer:

  1. Allow different types per dimension.
  2. Yes, restrict to the types mentioned above
  3. Representation of "dimension labels": In metadata: see STAC - In processes: Vector cube, 1 vector dimension, 1 label
  4. ?
  5. See https://github.com/Open-EO/openeo-processes/issues/330
  6. geometry
m-mohr commented 2 years ago

Question 7: What do we do with additional "metadata", e.g. ids and properties assigned to a feature? Related: https://github.com/Open-EO/openeo-processes/issues/347#issuecomment-1070742781

Not sure about the IDs, but I guess for vector data you specify which properties to load into the data cube (as additional dimension if 2+ properties) and the rest is kept somewhere in the background. So we may want to add id and properties as additional optional fields to the vector dimension. There's no way to access these information through processes right now, but we should probably state that id and properties are kept untouched in general by processes unless otherwise stated by processes.

This is issue about the additional metadata that is present at the start and may get passed through and should be included in the result is also very much unspecified for raster, by the way.

mkadunc commented 2 years ago

for vector data you specify which properties to load into the data cube (as additional dimension if 2+ properties) and the rest is kept somewhere in the background. So we may want to add id and properties as additional optional fields to the vector dimension.

I'm not sure I understand this 'additional dimension' part — say we have a vector cube which stores a real-valued variable mean_reflectance with 3 dimensions (geometry, time, band), and we want to load 2 extra properties for vector data (e.g. id, land_class):

soxofaan commented 2 years ago

I agree with @mkadunc and had the same conceptual struggle in https://github.com/Open-EO/openeo-processes/issues/356

m-mohr commented 2 years ago

I think we need to discuss this again in detail with all experts. As we are close to the end of SRR3, we will likely not be able to tackle it beforehand so I'd propose to have a dedicated meeting afterward (or discuss it in Bolzano).

m-mohr commented 2 years ago

Some notes from the April PSC meeting: