Open-EO / openeo-api

The openEO API specification
http://api.openeo.org
Apache License 2.0
93 stars 12 forks source link

Workspaces API #518

Closed m-mohr closed 1 month ago

m-mohr commented 1 year ago

Implements #135 ~Partially implements #376~ Creates the basics for #450 Related to the processes PR: https://github.com/Open-EO/openeo-processes/pull/485

jdries commented 1 year ago

Trying to summarize the current proposal:

GeraldIr commented 11 months ago

I do have some input on this after implementing our proof-of-concept:

These are my initial opinions on this extension. Excuse the delay in my response, but there was not much for me to add before. If I think of anything else I'll be sure to make it known.

m-mohr commented 11 months ago

Thank you for the feedback @GeraldIr. Please see my comments below.

I think there's no point in having a user-facing endpoint for changing the quota of a given bucket. This is either irrelevant for external workspaces, or shouldn't be controlled by the user for openeo-created workspaces.

Okay, but this is purely for PATCH /workspaces/:id, right? I've left it in POST /workspaces for creation. Also, I kept PATCH /workspaces/:id because I added a title and description for workspaces. We have that all through openEO and should help to organize and describe the workspaces, especially also for sharing later.

This also goes for the "provisioning" status, and by extension for availability of workspaces more generally. Initializing a bucket (for s3 in my experience) is incredibly fast and a user needs to receive their credentials at the end of a call to the endpoint anyways. If a bucket is unavailable it should just not show up in the list workspace endpoint and the creation shouldn't be queued but the call should only return when the workspace is ready (like the synchronous processing endpoint)

Is this generally the case? We can't just base our assumptions on AWS and need more evidence across more providers. I assume the EOPCA has thought about this a bit more and as such I'd like to keep this as it is until we have more evidence around.

There should be an endpoint for sharing access to an openeo-created workspace with other openeo users, it should be possible to do this granularly, for instance only sharing the results of a single job, or sharing a single folder/file.

We don't really have established sharing in openEO in general. We should probably establish this as a common concept in openEO after creating the general API for workspaces and then also apply this here.

There should be an option for storage type when registering an external workspace, so you can specify if it is for instance s3, Azure blob storage or anything else a backend decides to implement along with a list of supported storage types returned in some endpoint. This info is vital for interacting with a workspace for the backend and might not be apparent from the url or credentials.

So I guess we need something like GET /workspace_types (somparable to GET /service_types)? What would we need to describe in there apart from title and description? And then a user can optionally also hoose the workspace type during workspace creation?

I think saving results in a workspace should be configurable via an option or argument in any relevant result node process. Ideally the user gets a drop-down created from the get /workspaces endpoint for choosing which one to save to (potentially multiple).

How to interact with the workspace in the processes is up to discussion as mentioned in the Teams chat in Oct and Nov. Let's discuss this in the openEO community call on the 6th of December.

Best, Matthias

jdries commented 11 months ago

@GeraldIr do you happen to have a pointer to the demo requests that you showcased? Or otherwise the implementation itself? Have we considered how this will work in a federated setup? Are there calls for other backends to retrieve the info and credentials to be able to store something in a workspace?

m-mohr commented 11 months ago

I've updated the PR to only be the workspace API. The following PR adds a process to store data to a workspace: https://github.com/Open-EO/openeo-processes/pull/485

The User Collections are define now via a STAC API extension + openEO processes:

GeraldIr commented 10 months ago

This is a short write-up of the main differences between our proof-of-concept and this specification:

Main differences for GET calls between our implementation and this specification are individual metadata descriptions. For instance in our proof-of-concept there is no concept of a storage quota yet. Nothing major though.

We did have a different endpoint for registering and creating workspaces, although I wouldn't be opposed to combining them into a single POST the logic in the backend for both these actions is fundamentally different, as well as the payload/response.

Creating and deleting are functionally identical otherwise.

Updating the workspace is as of yet unimplemented.

There is also functionality for sharing results/workspace access with other users, this necessitates a "register-user" and a "share-workspace". Register user allows a user to get credentials without having to create a workspace first, share workspace allows you to open up a workspace for other users. This way nobody has to send around access credentials for things like this, but n-times use signed links could also be a more abstracted solution which does not require a step on the side of the user the results are being shared with.

Saving results to a workspace is supported via adding a "workspace" option with the workspace title/ID to the save result, but this isn't specifically part of this spec.

m-mohr commented 10 months ago

@GeraldIr Do you plan to update your implementation according to this spec? What would you like to change and why? As we are both in SAP07, I think the expectation is that we come up with a common solution.

Apart from register-user and share-workspace, I don't see immediate todos for me. I'm not quite sure yet what register-user does. share-workspace is something I'd not define yet as it's a bigger thing we need to discuss for openEO so a custom solution from your side is fine for me.

Storage quota is optional and comes from the EOEPCA API you've pointed me to.

Also, do you plan to update your implementation to support https://github.com/Open-EO/openeo-processes/pull/485 instead of using the save_result workspace parameter?

GeraldIr commented 10 months ago

@m-mohr

minIO has a list of users, which in reality is just a list of access credentials (access and secret keys) that are linked directly to policies which allows them to be used for accessing buckets. register-user allows one to create such a set of credentials without the extra step of creating a bucket. Then we can simply add policies for that pair of credentials (which represents a user) and share workspace access that way.

Yes, I think changing away from the workspace parameter would be the preferred way to go so I will be implementing it the proposed way.

m-mohr commented 10 months ago

@GeraldIr Thanks.

I effectively don't see a difference whether I submit the credentials during the workspace creation or upfront in register-user and then create the workspace with the user id. In both cases you need to send the credentials through a POST request to the server, in both cases you usually just send it once, right? Can you clarify why the separate register-user request is your preferred way?

I understand that you confirmed updating the processes, but could not understand from your reply whether you aim to align the HTTP APIs or not?

GeraldIr commented 10 months ago

@m-mohr

Register user is functionally only useful for sharing results, so as long as that isn't in the actual draft you can probably just ignore it. When creating a workspace for the first time that part is abstracted away from the user regardless.

And yes I can align the rest of our API with the specifications, that shouldn't be a problem.

m-mohr commented 10 months ago

Great. If you have any feedback for the spec, please let me know. It's not set in stone at all...

jdries commented 10 months ago

@GeraldIr we would also like to register workspace metadata. Is the source code for the component that you built in the SAP available somewhere? Or how does it integrate with the rest of the platform?

m-mohr commented 10 months ago

Generally, @GeraldIr when is it planned to close SAP07?

GeraldIr commented 3 months ago

We've recently made progress on the backend implementation of workspaces and would be ready to discuss finalizing this draft/working out all the details. (maybe in the next developer meeting for openEO on the 7th of August, or a standalone meeting).

As for a (non-comprehensive) list of topics I think we should discuss and come to a final consensus on:

Also if there is any questions about the implementation, which wouldn't fit this thread you can send me a private message and I'm happy to discuss.

m-mohr commented 3 months ago

@GeraldIr Based on your current implementation, are any changes required to this PR #518 or https://github.com/Open-EO/openeo-processes/pull/485 ?

Currently, there is no work planned for front end integration into the Web Editor. Nevertheless, is the current implementation deployed somewhere for testing?

GeraldIr commented 3 months ago

@m-mohr There are some minor differences, like having register workspace and create workspace on two different endpoints, and some major differences in how we handle integration with actual process graphs, but this won't need to be adjusted on your end at all (We handle loading and saving in load_collection and save_result respectively), because it was just more convenient to do it this way for now (We will switch to https://github.com/Open-EO/openeo-processes/pull/485 as soon as that is finalized).

The only required change to the spec that I can see right now is that our implementation relies on is a register_user endpoint where users can just get credentials for the underlying object storage, but without actually creating a workspace as well, so that other users can share their results/workspaces with them.

As for a deployed version, yes, the current version is up and running and demos/tutorials on how to use it can be found here https://github.com/eodcgmbh/eodc-examples/tree/main/demos/workspaces

If there is any questions or bugs you encounter while looking through this you can just create an issuse over in that repo or bring them up in todays meeting.

GeraldIr commented 2 months ago

@m-mohr any updates on this?

m-mohr commented 2 months ago

@GeraldIr Can you point me to any specific documentation around the register user endpoint? Is this what is shown https://github.com/eodcgmbh/eodc-examples/blob/main/demos/workspaces/demo-register-workspace.ipynb?

(Anyway, I might have issues working on this before November. This SAP has been delayed so much...)

m-mohr commented 2 months ago

Updated according to the discussions that I had today with @GeraldIr:

EODC also implements a register-user endpoint that allows to grant additional users access to a workspace. As we generally have very weak user management or sharing capabilities, we left it out here but we could eventually adopt it from EODC if others have a need for it.