ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Workflow file authorization #199

Open br-lewis opened 1 year ago

br-lewis commented 1 year ago

I'm attempting to use toil to run CWL workflows where the workflow file URLs are going to be behind auth. The documentation doesn't seem to directly address how implementations should handle this situation. Would this fall under the TRS? And are there plans for integrating that into WES?

uniqueg commented 1 year ago

There is no WES way of doing this, i.e., there is no specific provision for this case in WES at the moment. I agree that this use case is important, and in principle I think it is relevant not just for descriptor files but any input and even output files (see the related issue in TES: https://github.com/ga4gh/task-execution-schemas/issues/169).

Now, should this be solved in WES or TRS? Or DRS? Or via Passport and work order tokens? Hard to say. Possibly all of the above. I would like to say that the GA4GH Cloud, DURI etc. Work Streams are aware of the general issue, but that being said, it might take a while until this has been designed, agreed upon and will make its way into the specs.

I'm afraid at the moment your best bet is to access the files first, then attach them to the request via workflow_attachment. Alternatively, you could check with Toil whether they are willing to implement an off-spec workaround (perhaps passing credentials in the header).

But perhaps others have better ideas? @patmagee @wleepang @briandoconnor @denis-yuen @kellrott

denis-yuen commented 1 year ago

I'm afraid at the moment your best bet is to access the files first, then attach them to the request via workflow_attachment.

This seems to make sense for WES.

From a TRS perspective, there's no official perspective on this. You could run a TRS implementation behind your auth or firewall as the case may be. There is also some rudimentary workflow sharing in Dockstore https://docs.dockstore.org/en/stable/advanced-topics/sharing-workflows.html?highlight=sharing that hasn't made its way into the TRS specification yet

patmagee commented 1 year ago

This is certainly a gap in the WES, TES and TRS spec at the moment which partly reflects a greater challenge that is not strictly related to auth/z. At the moment the current assumption is either 1) You can solve this in whatever way makes sense and 2) the engine will have access to the files so you do not need to worry about it.

Most implementations I have seen tend to err on the side of 2 while still trying to solve for more complex use cases and implementing their own bespoke systems. Part of the problem is that Passport really was not mature enough, or integrated enough to be used when the current version of the spec was first released.

I think there is a real challenge here and it will likely boil down to integrating passport and work order tokens at each respective level (depending on the implementation). Suggesting different "standard" mechanisms outside of the current work in passport seems redundant to me and will likely lead to less adoption as opposed to more adoption. So my guess is that implementations will optionally support Passport, but it is possible every engine, or implementation will still have their custom way of handling this, or working with passport.

Another orthogonal challenge is whether every engine should be able to understand how to localize DRS object? you immediately run into challenges with Egress, object store support and what not that are beyond the scope of authorization that is not easy to solve for with the current DRS spec

br-lewis commented 1 year ago

Ok so if I'm reading this correctly the consensus is that this is, at least for now, outside the scope of WES and left up to the individual implementations to handle out of spec until there is a better idea of how to handle these issues. Is that right?

I've looked at TRS and DRS some but can't seem to find a list of available implementations. Does that exist somewhere? Are there implementations that are at a reasonable level of maturity? Sorry if these questions seem obvious, I stumbled into these projects via Toil and I'm not even working in bioinformatics so I don't have much context for the GA4GH or the Cloud Workstream effort but this does seem relevant to what I'm trying to do.

Those questions aside, if this is out of scope for WES then I believe this issue can be closed.

coverbeck commented 1 year ago

Dockstore and WorkfowHub are TRS implementations. Disclaimer, I work on Dockstore.

uniqueg commented 1 year ago

@br-lewis: Yes, I guess that would be a good summary of the current state.

Re: TRS implentations, @coverbeck listed the major implementations available as public services. We (ELIXIR) also have a more lightweight, generic implementation that could be set up by an individual person or org, e.g., a company that wants to restrict access to workflows to its employees, or to be used as a workflow backend in web portals dealing with workflows. You could have a look here: https://github.com/elixir-cloud-aai/trs-filer

Note, however, that so far it's only been used in PoCs, and we currently still don't have access control and a few other things integrated (see the issue board for an idea of what's missing at least - there are certainly more).

As for closing the issue: I would rather wait with that until the entire process of accessing protected resources is discussed in a more central location and on the Cloud WS roadmap.