Unification of /processes and /process_graphs

soxofaan commented 4 years ago

This is follow up issue of #305/#309, which added a namespace property to process graph nodes to identify whether the process_id property refers to a predefined process or a user defined process.

Possible next steps:

merge GET /processes and GET /process_graphs and work with an additional namespace request parameter to select PDP or UDP (or even other namespaces)
unify all /processes and /process_graphs requests to something like /processes/{namespace}/{process_graph_id}

jdries commented 3 years ago

This issue is becoming relevant again for us: we'd like to allow users to expose their process graphs. This is supported by the namespacing, but it seems they are not yet discoverable? I would try to avoid breaking the API for this, at least not in a 1.1 version. So certainly the /processes endpoint should not change in an incompatible way.

So I'm proposing something like: /namespaces -> list all namespaces /namespaces/{namespace} -> list all processes in a namespace /namespaces/{namespace}/{process_graph_id} -> list process metadata This resembles the GitHub approach: our namespaces would be called an 'organization' in GitHub. Users could store process graphs in multiple organizations (namespaces).

By not reusing /processes or / process_graphs we are fully backwards compatible. I'm not yet sure if 'namespaces' is the best name?

jdries commented 3 years ago

Rethinking this: the use case of sharing user defined processes is also what the 'marketplace' in openEO platform is supposed to do. So we could also consider defining an entirely separate microservice, that can move at its own pace, so we can move at a faster pace compared to openEO versions? (Meaning in the beginning, when we are defining and changing such a service.)

aljacob commented 3 years ago

What do you mean @jdries with a seperate microservice? Just an arbitrary http service that is able to work with openEO processes or process_graph and store and offer them? I think the namespaces proposal from you, from yesterday was not bad. We can still keep that as an experimental endpoint for now, without a new API release.

Thinking more about it... I have to say I am also a bit confused by the duality of /processes and /process_graphs now that I have implemented version 1. As process_graphs are anyways always wrapped in a process when send to the back-end for processing, maybe it would actually make sense, to organize also their storage and sharing via processes. Similar to what @soxofaan has proposed originally.

m-mohr commented 3 years ago

The /process_graphs endpoint is a legacy from the old 0.4/... days, where this was the endpoint for process graphs. The name currently is also wrong, because it serves user processes, not process graphs only. As the change from process graphs to processes and namespaces came in rather late in 1.0, we didn't really made the big change to completely refactor the endpoints to avoid too many changes. Ultimately, the desire would be to have everything under /processes.

I'm not a big fan of the /namespaces approach, I must admit. This needs quite a lot of round trips to get the actual process information if it's the only way to get the information. Also, namespace is less clear than processes. I like the proposal from @soxofaan. This can likely be done with relatively low work and /process_graphs could just redirect for some time. (Now I'm happy that we never did /processes/:id as this now allows us to use the namespace under /processes as we like).

This would be somewhat backward-compatible, I think: /processes - pre-defined processes, stays as it is, could be an alias for /processes/backend /processes/:namespace - all processes under the namespace, could be a user id. Shortcut for the user id could be /processes/user /processes/:namespace/:processid - GET/DELETE/PUT for individual processes

Redirects: /process_graphs => /processes/:userid /process_graphs/:id => /processes/:userid/:id

The only thing missing is a list of namespaces, but that could be retrieved via GET / either as part of the endpoint list or as a separate property.

This way we could have it in a 1.1 release, I think. It could also just be an API extension for now. At the same time, we should think about sharing though.

A microservice at the EOSC marketplace or the Hub could just implement the same API. We had that earlier for 0.4 at the Hub, but had no time yet to update to 1.0.

jdries commented 3 years ago

The proposal to allow /processes/:namespace and /processes/:namespace/:processid is also fine for me. Would the fact that '/processes/somename' can be both a namespace or a process be problematic? Like in the defining the typing of such a call?

I also created a diagram of a broader marketplace design, to give an idea of what I'm thinking of in terms of additional use cases that I'd like to support:

m-mohr commented 3 years ago

Would the fact that '/processes/somename' can be both a namespace or a process be problematic? Like in the defining the typing of such a call?

/process is the only endpoint at the moment - /processes/:processid is not defined in the API and would not be in the proposal! It's only allowing /processes/:namespace. See my comment above.

I also created a diagram of a broader marketplace design

Looks interesting, but is also quite an effort. How's the relationship with the Hub?

jdries commented 3 years ago

Indeed, you're right, our backend allows this: https://openeo.vito.be/openeo/1.0/processes/raster_to_vector but it's simply not supported API, or a leftover from 0.4.0. That indeed simplifies matters.

It's indeed not entirely trivial, it could be an addition to the Hub, but it could also be an entirely separate component. (It's similar in the sense that it's also central, but it doesn't require links to individual backends, like the hub does. Unless if we would aggregate all user defined processes on all backends, that would also be an option...) There's also a potential relationship with existing marketplaces, like EOSC, that have to do some of these things already. Or a Zenodo for process graphs...

m-mohr commented 3 years ago

@jdries Let's discuss the Hub/Marketplace integration in teams or in a separate issue, it doesn't really fit into this issue.

jdries commented 3 years ago

Agreed, if we can do a non-breaking API extension for user defined process discovery, then I can at least proceed. I believe the only open things is retrieving list of namespaces. As it can get rather long, I would prefer a separate property where you could even allow paging...

m-mohr commented 3 years ago

@jdries I could dig into this in Nov and make a proposal. Would that work? When do you need it?

jdries commented 3 years ago

Sure, that would work. We're already experimenting with offering user defined processes, but the discovery part might not be crucial in the beginning. If we would need it, we'd probably implement the most recent proposal in this issue.

jdries commented 3 years ago

@m-mohr @soxofaan I wrote some docs on how users can currently publish their processes: https://github.com/Open-EO/openeo-python-client/blob/master/examples/notebooks/ProcessAsAService.ipynb As a next step, I'd like to start on some of the things proposed in this issue. Do any of you still see alternative options than the one described above?

m-mohr commented 3 years ago

@jdries

Re notebook: Where to comment best on the notebook? I added comment to the commit as I think it would "bloat" this discussion a bit.

Re API for sharing processes: I think it's still the approach best integrated with the current API, so go ahead. Once D35 is submitted (tomorrow?) I can likely also draft an OpenAPI fragment...

Re namespaces: I think the only thing not covered in my proposal was where to get the namespaces. I'd propose to add a namespaces field to /processes. Just an array of namespaces to be used in /processes/:namespace. The latter endpoint then can return processes and namespace metadata if there's any.

soxofaan commented 3 years ago

I agree with @m-mohr : this ticket is mostly about new/changed features in a newer (incompatible?) version of the API, while working with stored process graphs should already be possible with API version 1.0

m-mohr commented 3 years ago

I'll work on this today and hopefully have a OpenAPI this afternoon. Any recent activities I should consider?

soxofaan commented 3 years ago

I added a proof of concept implementation to VITO backend for listing "public" user defined processes under /processes/:namespace where :namespace is something like u:john.

Some things that came up while working on this:

do we predefine the format of namespaces? e.g. for our use case we want to use per user/organization namespaces. I now just implemented user "namespaces" by adding prefix u: to the userid/name, but this is a bit ad hoc of course.
Another issue is that the user/organistation name might have characters that do not play nice in URLs. Likewise: we have been playing with the idea of using (github) urls as "namespace", which also does not play well inside a URL parameter
We are also using a "public" flag for User defined processes: public UDPs are "visible" for other users (or even unauthenticated requests) under the user's namespace, while non-public UDPs are only visible to the user. This is "public" flag is not part of the spec yet, and might be interesting to include in this context

m-mohr commented 3 years ago

do we predefine the format of namespaces? e.g. for our use case we want to use per user/organization namespaces. I now just implemented user "namespaces" by adding prefix u: to the userid/name, but this is a bit ad hoc of course.

It's something like a namespace for namespaces. ~I'm also thinking whether that should be an API thing or not, i.e. the current approach forbids the user_ids user and backend and - if left open for implementation - that should at least be a note in the user_id specification. The best separation would be /processes/<backend|user|...>/{id}, but then there's no additional id for back-ends so~ a different character would make sense, preferably one that doesn't need encoding. Hmm....

Edit: No, wait, back-end processes are at /processes and then the namespace part could always a plain user-id... Then there are no conflicts by default as namespace = user id

Another issue is that the user/organistation name might have characters that do not play nice in URLs. Likewise: we have been playing with the idea of using (github) urls as "namespace", which also does not play well inside a URL parameter

Although I see the issue with special characters, I don't understand the URLs as namespaces issues. External URLs would not be exposed via the /processes/{namespace} as it's an external process one may load into the process (graph).

We are also using a "public" flag for User defined processes: public UDPs are "visible" for other users (or even unauthenticated requests) under the user's namespace, while non-public UDPs are only visible to the user. This is "public" flag is not part of the spec yet, and might be interesting to include in this context

Sharing is indeed a completely different topic. While public: true/false is the most basic answer for it, it doesn't take into account that there might also be sharing on a per-user basis or so.

m-mohr commented 3 years ago

The question to answer is: Is it likely that we introduce other namespaces than the ones for user-defined and pre-defined processes? (What could these be?) Should the user id by default be the namespace, e.g. /processes/soxofaan? This would require some prefixing for other namespaces. Or should users by default have a namespace? I'm thinking about the @ char, which is disallowed in the user id, so for example /processes/@soxofaan.

soxofaan commented 3 years ago

Is it likely that we introduce other namespaces than the ones for user-defined and pre-defined processes? (What could these be?)

That's up to the imagination of a backend of course. At VITO, we are for example playing with the idea of per-organization or per-project namespaces. When working with multiple people on the same project it can be useful to have a shared namespace instead of a per-user namespace. Comparable to how you have users and orgs on github. How a backend manages organizations or projects should not be part of the openeo API spec at this point I think.

So in that sense you indeed need some kind of namespace system for namespaces :stuck_out_tongue:

the @sign is a nice idea for per-user namespaces, but what do you do for other kind of namespaces?

Another solution are namespaces like this:

backend: default namespace of pre-defined processes (alias for null namespace)
user: namespace of authenticated user
user/john: namespace of another user
org/crew123: namespace of an organization
project/doom: namespace of a project
...

The 1.0.0 definition of namespaces is forward compatible with this and it allows backends to invent their own namespace namespace :). It also plays nice when using in URLs (e.g. /processes/user/john and /processes/org/crew123)

soxofaan commented 3 years ago

It also plays nice when using in URLs

while reviewing #348 I realized this is actually not automatically true. It might be better to have another separator than /, e.g. :

m-mohr commented 3 years ago

At VITO, we are for example playing with the idea of per-organization or per-project namespaces. When working with multiple people on the same project it can be useful to have a shared namespace instead of a per-user namespace.

Additional namespaces and how they can be separated from the API-proposed namespaces is then up to the back-end. I feel like namespaces of namespaces is over the top for the API. Also, it could work like in GitHub where a person is the original owner, but with sharing you can grant other persons access to write/delete etc. your process graphs. That was what I had in mind when writing this proposal (e.g. you could do a PUT /processes/@jdries/abc if Jeroen has granted you write access.

How a backend manages organizations or projects should not be part of the openeo API spec at this point I think.

Yes, indeed. So it's not up to the API decide how they will be made available. Through the user ID restrictions (and potentially the @), there are several ways to do that and not conflict with the user ids.

So in that sense you indeed need some kind of namespace system for namespaces 😛

I don't think so and I'd like to avoid this to not make things overly complex by default.

the @sign is a nice idea for per-user namespaces, but what do you do for other kind of namespaces?

Whatever a back-end can do without conflicts. special character disallowed in user ids + org name would work for example. You can still implement parent-namespace:sub-namespace if you like... the API doesn't restrict you to do that.

The 1.0.0 definition of namespaces is forward compatible with this and it allows backends to invent their own namespace namespace :). It also plays nice when using in URLs (e.g. /processes/user/john and /processes/org/crew123)

Slashes don't work very well as OpenAPI doesn't allow un-encoded slashes in path parameters (and it makes the OpenAPI definitions of the endpoints complex) and with URL encoding it's not as nice as it sounds.

jdries commented 3 years ago

I'm wondering if we need to define a separate REST API (microservice) for sharing and discovery of processes? Like if we're opposed to advanced features and namespace rules as part of openEO API, then pulling that into its own spec is perhaps more flexible? In things like openEO platform, we could then have a single process catalog that contains user defined processes for multiple backends. Also think of other features:

process search
advanced rights mgmt
versioning
sharing a process, but hiding the implementation

m-mohr commented 3 years ago

Yes, I had them all on my extensions list. The issue currently is that OpenAPI is not so extensible as one might think. Once you add things to existing endpoints, OpenAPI tooling breaks (currently also struggling with it in STAC, OGC is also having issues).

Apart from that, what we currently do is basically just a unification of the existing behavior and cleaning-up the /process_graphs mess. There's no real additional functionality in the proposal, except listing the namespaces at /processes. But the way it is now is mire flexible for the future.

Sharing will very likely be separate from the Core API, but as mentioned, I'm struggling a bit with OpenAPI.

Open-EO / openeo-api

Unification of /processes and /process_graphs #310