ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Request to Add `DELETE /runs/{run_id}` Endpoint #218

Open suecharo opened 2 months ago

suecharo commented 2 months ago

Hi there,

We're developing a WES called Sapporo, which we're using as the backend for our web service. This service accepts requests from both authenticated and anonymous users to execute specific workflows with file attachments.

We've received requests from users to be able to delete the input and result files associated with their workflow runs after execution. To address this, we would like to introduce a DELETE /runs/{run_id} endpoint to explicitly clean up related files from the WES server.

Thanks as always!!

uniqueg commented 1 month ago

I think there is some push in the Cloud Work Stream to keep the APIs as lightweight as possible. For example, the DRS and TRS endpoints have no operations to create resources. And none of DRS, TRS, WES and TES currently have operations to delete endpoints, as you suggest.

While I understand the reasoning behind keeping the APIs slim and easy to implement, I do think that standardizing such operations would be incredibly useful towards increased interoperability and richer functionality. My suggestion would be to publish spec extensions for such operations. This would strongly indicate that implementing these operations is optional, while still suggesting a unified way for services to implement them - if they choose to do so (or are required to do so by law, as in GDPR).

Would that be acceptable to you, @suecharo?

manabuishii commented 1 month ago

@uniqueg @suecharo

Specification extensions are a good idea. Without including too much in the specification, I understand that the image is to be able to make choices according to the situation, is that correct? (GDPR and similar situations can be taken into account).

In this case, do you imagine that the actual implementation would be that when DELETE is called, you can choose any of the following?

suecharo commented 1 month ago

Thank you, @uniqueg .

I think there is some push in the Cloud Work Stream to keep the APIs as lightweight as possible.

Indeed, as Manabu mentioned, this direction seems promising. In Sapporo, we are actually extending some parts as well.

I think we need to consider the proper format for these extensions. For example, it would be beneficial to have fields in the ServiceInfo for extensions.

Currently, the ServiceInfo only has:

"supported_wes_versions": [
    "string"
],

However, a format like this could be more useful:

"supported_wes_versions": [
    "1.1.0",
    {
        "base_wes_version": "1.1.0",
        "version": "sapporo-wes-1.1.0",
        "openapi_url": "https://example.com/sapporo-wes-1.1.0-openapi-spec.json"
    }
],

This would allow for more detailed versioning and provide direct links to the relevant API specifications.

uniqueg commented 1 month ago

Hi @suecharo,

I had previously proposed a mechanism to broadcast the extensions to a given API that a given service supports via the service info, similar to what you suggest (though I have called it capabilities): https://github.com/ga4gh/TASC/issues/45

How I imagine this to work is that if a service implements the delete_workflow_resource extension/capability, it is expected to delete the indicated resource and associated objects (as you suggest) when DELETE /runs/{run_id} is called. Whether that is supposed to happen "immediately" or "at an appropriate time" is either up to us to debate when defining the extension/capability, or we leave it up to implementers (similar to POST /runs/{run_id}/cancel.

On the other hand, if the extension/capability is not implemented, the behavior for calling DELETE /runs/{run_id} is undefined. It might be that the operation is not implemented at all (which might,e.g., cause a 404), or a different behavior is implemented for that operation (e.g., only the resource, but not the associated objects are deleted), or it is implemented in the exact same way and the service just doesn't advertise the capability (which would be a pity).

So the difference is that in the one case, a WES client can reasonably expect the implementation to behave in a clearly defined manner for that operation (just as for any other WES operation), whereas in the other case, the client can set no expectations with regard to the availability or behavior of that operation at all.

In practice, a client would need an additional call to verify whether a resource can be deleted as expected by first calling the service info. A GUI client, e.g., could first check whether deletions are supported in a given WES instance before activating a "Delete" button for a given run. If deletions are not supported, the button could be hidden or grayed out with a tooltip saying "Not supported by this WES instance".

Also, if required by local regulations (e.g., GDPR), a given client could allow connections only to those WES instances that support resource deletions. A GA4GH Service Registry API service that allows querying for WES instances that support a given extension/capability could help with easily identifying such services. WES clients could then be restricted to only talk to WES instances that are listed in the registry service and that also implement the requested extensions.

Anyway, while an extension mechanism is surely useful, I am not saying that DELETE /runs/{run_id} couldn't or shouldn't make it to the core specification. Personally, I actually think it should, because I feel that a resource owner should always have the freedom to delete their resources. This was really just to open a pathway to still define this behavior if the majority of people think that it should not be part of the core implementation.

I would suggest that you open up a PR in which you just add the endpoint to the WES core specs. We can then discuss how people feel about it. You could also add a similar PR for TES.