Allow URLs as process namespace

m-mohr commented 1 year ago

To be able to load external processes (UDPs) in process graphs easily, it would be valuable to have the namespace being a URL for read-only access. The process ID stays as it is.

soxofaan commented 1 year ago

FYI: the VITO backend already supports this (because this turned out a valuable solution in a lot of practical use cases). It is also already documented in python client: https://open-eo.github.io/openeo-python-client/cookbook/udp_sharing.html#using-a-public-udp-through-url-based-namespace

m-mohr commented 1 year ago

I was also wondering whether the namespace should not include the process ID, but then I realized that the URL may happen to have (or not have) a .json file extension and that makes it ambiguous. So specifying the full url in the namespace (for URLs) makes sense.

soxofaan commented 1 year ago

FYI: in VITO implementation we currently support it as follows

if user provides process_id and namespace="https://.... we try the following URLs and take the first hit:

{namespace}{process_id}
{namespace}{process_id}.json
{namespace}

e.g. with process_id="ndvi" and namespace="https://example.com/udp/" these two URLs are first attempted:

https://example.com/udp/ndvi
https://example.com/udp/ndvi.json

The last option is to allow specifying the UDP URL fully through the namespace (ignoring process_id)

m-mohr commented 1 year ago

Yeah, I think for namespace URLs we should simplify that the URL is the exact URL to the file and you don't need to do any try&error. The only thing that you need to check is the response. If it's a single process, then just use it. If it is a process list response, take the process from the list with the given process ID.

If not a URL, resolve as usual.

jdries commented 6 months ago

This is becoming a key issue for federated setups and for ESA APEx.

In APEx, the use case is to have the UDP's stored as json in github rather than being managed in a specific backend. This should then make UDP invocation less backend dependant, and simplify the management.

@soxofaan @m-mohr We'll probably want to discuss what is still to be done here.

m-mohr commented 5 months ago

Community meeting: No objections, next step is creating a PR for review.

soxofaan commented 5 months ago

Yeah, I think for namespace URLs we should simplify that the URL is the exact URL to the file and you don't need to do any try&error.

yes that's fine for me

The only thing that you need to check is the response. If it's a single process, then just use it. If it is a process list response, take the process from the list with the given process ID.

I'm not sure that process listing support is even necessary. I would work from the assumption that users prefer a 1-on-1 mapping between UDPs and files. For example, the python client has some helpers to store/load a UDP from/to a file. If you want to manage multiple UDPs in the same file, you have to do a lot more cumbersome housekeeping on your own.

So I would keep the API extension very basic for now:

if namespace starts with https?://: assume this is a URL of JSON file containing a UDP representation and load the UDP directly form this URI
else: proceed as usual

Also note that even with such a simple API spec, there is still room at the level of clients to provide a more rich UI (e.g. user passes a URL as process id -> client automatically converts that to actual process id + URL namespace)

m-mohr commented 5 months ago

The reasoning behind allowing lists of processes (as in GET /processes) is that it's the only official public endpoint we have to load processes from. Everything else loads from non openEO API contexts.

soxofaan commented 5 months ago

Ah ok, makes sense. So with a process listing document you mean a listing in the style of GET /processes and GET /process_graphs:

{
  "processes": [
     {"id": "..", "parameters": [...], "process_graph": {...}},
     {"id": "..", "parameters": [...], "process_graph": {...}},
  ],
  "links": [..]
}

Problem is that the spec for process listing currently allows and even recommends to not include the "process_graph" fields in the process listing, while this is actually the vital thing for UDPs

m-mohr commented 5 months ago

Yes, that's waht I meant. The process listing might be more interesting outside of the /processes context, but more for a public variant of /process_graphs. Maybe we need to think about #348 again...

m-mohr commented 5 months ago

@jdries @soxofaan PR is up at #538

soxofaan commented 4 months ago

With #538 being merged now, I think this ticket #515 can be closed for now I think

Open-EO / openeo-api

Allow URLs as process namespace #515