Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
49 stars 15 forks source link

Vector cube design: property preservation when using aggregate spatial #466

Open soxofaan opened 1 year ago

soxofaan commented 1 year ago

(Related to use case experiments discussed in https://github.com/Open-EO/openeo-processes/issues/448 https://github.com/Open-EO/openeo-processes/issues/449)

Set up:

Question: are the original GeoJSON-style properties of vc1 still available in vc2?

I kind of remember vector cube discussions where we wanted preservation of properties (the "Yes" approach), e.g. using aggregate_spatial to "enrich" a vector cube with additional "columns" of aggregation data. However, I think the current design of vector cubes enforces the "No" approach because there are just cube values and you can not generically/automatically combine pre-existing cube data with new (aggregation) cube data.

jdries commented 1 year ago

Don't have a good answer to this one. For now, we kind of implement the 'no' approach, but we do try to preserve things like the feature identifier, as this is relevant to keep track of which timeseries belongs to which geometry.

m-mohr commented 10 months ago

The feature identifier is not part of the (GeoJSON) properties and belongs to the "core metadata" (as it resides at the top-level). That was at least always my "mental" model, based on GeoJSON. I'd think it's probably a good idea to keep track of it anyway. Maybe we need to clarify this?

My aim in 2.0.0 was to clearly communicate whether properties are preserved or not. Maybe it's not clear enough in all processes, but at least aggregate_spatial mentioned in the geometries parameter:

Feature properties are preserved for vector data cubes and all GeoJSON Features.

load_geojson, vector_to_random_points, vector_to_regular_points and vector_buffer similarly say:

Feature properties are preserved.

soxofaan commented 9 months ago

The feature identifier is not part of the (GeoJSON) properties and belongs to the "core metadata" (as it resides at the top-level).

I assume you are talking here about a "id" member of a Feature object, e.g.

{
  "type": "Feature", 
  "id": "abc123", 
  "geometry": {...}, 
  "properties": {...}

While this seems to be part of the GeoJSON RFC (If a Feature has a commonly used identifier, that identifier SHOULD be included as a member of the Feature object with the name "id"), I think I've seen quite some cases in the wild where the "id" is under the "properties" instead of more at the "Feature" top level. Maybe this can be fixed by adding an option to GeoJSON loading processes to promote a given property to more standard "id".

soxofaan commented 9 months ago

My aim in 2.0.0 was to clearly communicate whether properties are preserved or not. Maybe it's not clear enough in all processes, but at least aggregate_spatial mentioned in the geometries parameter:

Feature properties are preserved for vector data cubes and all GeoJSON Features.

Well, part of the problem I'm trying to raise here is that there is a conflict here regarding vector cube design:

For example:

You can not combine the original cube data ["geometry", "property"] with the aggregated cube data ["time", "bands", "geometry"] in a single cube, e.g. because the number of dimensions is different. The dimension type of "property" (type "other"?) and "bands" (type "bands") is probably also not compatible strictly speaking, but that could be adapted to relatively easy I guess.

So what I'm trying to say is this current statement in aggregate_spatial

Feature properties are preserved for vector data cubes and all GeoJSON Features.

is incompatible with the current consensus for vector cube design (store properties as cube values).

pankajdpatil commented 1 month ago

Sorry, new to opeEO, so please pardon if I am off-tangent.

From aggregae_spatial:

Feature properties are preserved for vector data cubes and all GeoJSON Features.

aggregate_spatial however somehow isn't preserving the properties, or "id" from my geojson features. I instead get "feature_index" but its difficult to tie it back to the original feature.

My aggregate_spatial logic looks like:

"aggregate29": { "process_id": "aggregate_spatial", "arguments": { "data": { "from_node": "load2" }, "geometries": { "type": "FeatureCollection", "features": [ { "type": "Feature", "id": "pp1", "geometry": { "type": "Point", "coordinates": [ 76.90113420870438, 23.06615990794603 ] }, "properties": { "pp": "Dinagat Islands", "kk":10 } } ] }, "reducer": { "process_graph": { "first1": { "process_id": "first", "arguments": { "data": [ { "from_parameter": "data" }, { "from_parameter": "context" } ], "ignore_nodata": false }, "result": true } } } } },

m-mohr commented 1 month ago

@pankajdpatil You are talking about a specific implementation. For support you need to contact the provider.

soxofaan commented 1 month ago

@pankajdpatil this thread is indeed about how to preserve e.g. feature ids, but on a more conceptual and back-end oriented level. I think your support request will be better served on an openEO forum like (depending the on the openEO backend you are using): https://forums.openeo.cloud/ or https://forum.dataspace.copernicus.eu/