Open-EO / openeo-processes

Interoperable processes for openEO's big Earth observation cloud processing.
https://processes.openeo.org
Apache License 2.0
48 stars 17 forks source link

Support data URIs in `load_url` #498

Open soxofaan opened 9 months ago

soxofaan commented 9 months ago

load_url currently only supports HTTP(S) URLs:

https://github.com/Open-EO/openeo-processes/blob/965bbaebd4d5984203a0437076c85a66a72a23e0/proposals/load_url.json#L12-L19

For a use case we were brainstorming about avoiding the overhead of creating/managing external URLs (for a lot of small files) and came to the idea to load from data URLs where the data can be embedded in base64 inside the process graph, without need for external files/URLs. E.g.

  "lu": {
    "process_id": "load_url",
    "arguments": {
      "url": "data:application/vnd.apache.parquet;base64,UEFSMRUEFRAVFEwVAhUAEgAACBwqAAAAAAAAAB...",  
soxofaan commented 7 months ago

I can cook up a PR for this is there is more interest for this feature

clausmichele commented 7 months ago

Interesting! There will be the risk to create heavy process graphs probably, but the same happens with inline geoJSON anyway. Good to have another option and happy to try to support it.

soxofaan commented 7 months ago

Indeed, we already had issues with users embedding huge GeoJSON constructs in their process graph, so this would not create a new problem. As a matter of fact, the textual representation of GeoJSON makes it very space-inefficient and data URLs could improve the situation because of binary encoding and compression.

But still, it could be the responsibility of the clients to put reasonable thresholds on this and warn about or forbid excessive payloads