process to convert inline GeoJSON to a vector cube

soxofaan commented 2 years ago

Closely related to #343 (or maybe even a duplicate):

(do) we need a process that converts inline GeoJSON to a vector-cube

m-mohr commented 2 years ago

Yeah, I've added it more explicitly to issue #343.

soxofaan commented 2 years ago

Still, regardless of #343, it could be useful to have an explicit process that can be more powerful compared to doing implicit conversions with best effort defaults.

e.g. possible features/options to think of or include:

handle Polygon/MultiPolygon by implicitly handling them as single-feature feature collections
automatically add a unique (e.g. auto-increment int) "id" property
drop provided properties, or enforce missing properties with a default
drop features that are missing certain properties

(caveat: I'm in brainstorm mode here, not sure everything is that useful after more thought)

m-mohr commented 2 years ago

This could also be more easily extended to load e.g. OGC's new JSON-FG.

m-mohr commented 2 years ago

How's the relation to load_files? In principle, it doesn't matter where the GeoJSON is coming from, but you have the same issues or parameters to influence loading/converting the data. Thus, it's a bit strange to have a dedicated process for inline geojson, but then a completely different process for loading all kinds of files in load_files (where the parameters for GeoJSON are defined in GET /file_formats). So basically, all your options above should go into GET /file_formats, but then they are basically backend-dependant.

soxofaan commented 2 years ago

Good point, there would indeed be quite some overlap.

An alternative is defining a process like load_vectorcube , which would be the vector-equivalent of what load_collection is for raster cubes. It could accept several source types:

(inline) JSON object, to be interpreted as GeoJSON/JSON-FG
relative path (string) for loading vector date from user workspace
URL string(s) for loading vector data from URL

or we add 1. to the current load_files proposal of #322, and find a more generic name than "file" so that it isn't weird to load inline GeoJSON that way

m-mohr commented 2 years ago

loadcollection could also load a vector-cube, if you expose vector data from the back-end side at /collections. Also, the second word in load* processes always refers to the origin, not to what is returned.

The issue on the other hand with load_files is that it's weird to mix in inline stuff. This doesn't necessarily only need to be GeoJSON, you could also inline CovJSON at some point. And then it gets messy.

The underlying issue here is that the processes don't really know about files and formats. This is all handled through /file_formats right now, also for raster. And in that sense GeoJSON is not different, but seems to have some kind of special handling right now due to being easily includable in process graphs. So what we may need to start at the openEO level at some point is to document best practices around file formats, similar to how we do it in Platform already a bit: https://docs.openeo.cloud/federation/backends/fileformats.html

This is challenging though as for some formats the conversion is not really clear (GeoTiff), while for GeoJSON to vector-cubes it might be a bit clearer. I guess we just need to describe more clearly how vector data in general (e.g. ids, geometries, properties) are handled and converted. That then helps with all other vector-related file formats, too.

Open-EO / openeo-processes

process to convert inline GeoJSON to a vector cube #346