Closed m-mohr closed 4 months ago
FYI: the VITO backend already supports this (because this turned out a valuable solution in a lot of practical use cases). It is also already documented in python client: https://open-eo.github.io/openeo-python-client/cookbook/udp_sharing.html#using-a-public-udp-through-url-based-namespace
I was also wondering whether the namespace should not include the process ID, but then I realized that the URL may happen to have (or not have) a .json file extension and that makes it ambiguous. So specifying the full url in the namespace (for URLs) makes sense.
FYI: in VITO implementation we currently support it as follows
if user provides process_id
and namespace="https://....
we try the following URLs and take the first hit:
{namespace}{process_id}
{namespace}{process_id}.json
{namespace}
e.g. with process_id="ndvi"
and namespace="https://example.com/udp/"
these two URLs are first attempted:
https://example.com/udp/ndvi
https://example.com/udp/ndvi.json
The last option is to allow specifying the UDP URL fully through the namespace (ignoring process_id)
Yeah, I think for namespace URLs we should simplify that the URL is the exact URL to the file and you don't need to do any try&error. The only thing that you need to check is the response. If it's a single process, then just use it. If it is a process list response, take the process from the list with the given process ID.
If not a URL, resolve as usual.
This is becoming a key issue for federated setups and for ESA APEx.
In APEx, the use case is to have the UDP's stored as json in github rather than being managed in a specific backend. This should then make UDP invocation less backend dependant, and simplify the management.
@soxofaan @m-mohr We'll probably want to discuss what is still to be done here.
Community meeting: No objections, next step is creating a PR for review.
Yeah, I think for namespace URLs we should simplify that the URL is the exact URL to the file and you don't need to do any try&error.
yes that's fine for me
The only thing that you need to check is the response. If it's a single process, then just use it. If it is a process list response, take the process from the list with the given process ID.
I'm not sure that process listing support is even necessary. I would work from the assumption that users prefer a 1-on-1 mapping between UDPs and files. For example, the python client has some helpers to store/load a UDP from/to a file. If you want to manage multiple UDPs in the same file, you have to do a lot more cumbersome housekeeping on your own.
So I would keep the API extension very basic for now:
https?://
: assume this is a URL of JSON file containing a UDP representation and load the UDP directly form this URIAlso note that even with such a simple API spec, there is still room at the level of clients to provide a more rich UI (e.g. user passes a URL as process id -> client automatically converts that to actual process id + URL namespace)
The reasoning behind allowing lists of processes (as in GET /processes) is that it's the only official public endpoint we have to load processes from. Everything else loads from non openEO API contexts.
Ah ok, makes sense. So with a process listing document you mean a listing in the style of GET /processes
and GET /process_graphs
:
{
"processes": [
{"id": "..", "parameters": [...], "process_graph": {...}},
{"id": "..", "parameters": [...], "process_graph": {...}},
],
"links": [..]
}
Problem is that the spec for process listing currently allows and even recommends to not include the "process_graph" fields in the process listing, while this is actually the vital thing for UDPs
Yes, that's waht I meant. The process listing might be more interesting outside of the /processes context, but more for a public variant of /process_graphs. Maybe we need to think about #348 again...
@jdries @soxofaan PR is up at #538
With #538 being merged now, I think this ticket #515 can be closed for now I think
To be able to load external processes (UDPs) in process graphs easily, it would be valuable to have the namespace being a URL for read-only access. The process ID stays as it is.