ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Optional ServiceInfo on supported workflows #190

Open vinjana opened 2 years ago

vinjana commented 2 years ago

Ideally, a WES server will implement executing all workflows that e.g. can be retrieved with OCI containers (for its platform, amd64, and convertible into the required container format, e.g. Singularity ;-) ), but some WES implementation may not allow to download/upload arbitrary workflows. This could be e.g. due to security policy, limiting backend implementations, restrictive firewall settings, etc. Instead only "certified" workflows or workflow versions may be allowed.

Would it make sense to communicate via the ServiceInfo, which workflows are supported? This could be a simple list of URI (e.g. to Git repository, etc.). The field should be optional -- also for backwards compatibility. If the field is not returned this would mean "every workflow". A possible name for the field could be supported_workflows or maybe even just workflows.

patmagee commented 2 years ago

@vinjana definitely. I know of a few situations communicated at the plenary which would require this feature directly in order to fit into the broader landscape of executors.

I think this could take on a few formats, one of them being that which you suggested. Potentially a more complete approach would be like the following:

{
  "supported_workflows": {
    "repositories": ["https://...../trs"],
    "formats": ["trs","http"],
    "workflows": [
     "workflow_1",
     "http://foo.com/workflow1"
    ]

  }
}
patmagee commented 2 years ago

I would probably also take the stance, that by default any URI is allowed and this form would likely only be used to convey restrictions

vinjana commented 2 years ago

I guess the use case for repositories, is to specify all workflows from specific repositories, e.g. because these are certified and under control, or because the communication protocol is known by the WES. That makes sense to me.

What is the use case for the formats? What about specifying protocols in the repositories with e.g. with trs:// URIs? This could be used to specify specific versions from git-repositories, right?

And workflows also contains an HTTP URI. I guess this should ored with the repositories (i.e. all from the repositories + the ones with URIs from workflows).

What about the workflow_1?

patmagee commented 2 years ago

Yeah @vinjana you have repositories right. Its basically a set of URIs pointing to ideally TRS repos, but also any set of workflows. The formats field is more of a general approach to show what schema of uris are supported. for example not every API will support TRS uris, (they may work fine with TRS urls but not the identifiers)

the last example workflow_1 may be a false example. I was imagining these systems that lock down workflow execution might have a unique identifier for their workflows that is not strictly a URI... if this is not the case the idea should be ignored haha

vinjana commented 2 years ago

Ah, o.k., then e.g. it would also be o.k. to e.g. use a Github-organization or a Gitlab group and implement such that all workflows in that organization/group are allowed. Seems like a nice proposal, because you can specify individual Git-repo tags but also repositories, groups of repositories, or even servers. :+1:

uniqueg commented 2 years ago

I certainly think that we should come up with an elegant extensible way of broadcasting capabilities through the service info (both in WES and TES), but for this particular issue I am not so sure - it seems to me to be in the domain of access control. One could argue the same for a list of users, upstream clients, downstream clients etc., and I'm not sure the service info is the right place to specify such restrictions. From what little (admittedly) I understood about the work order tokens, this may potentially be something that may be addressed through these?

vinjana commented 2 years ago

I agree that allowed workflow sources or even workflows is somehow related to authorization and permissions. It seems to be more likely that an implementer wants to regulate the workflows executable by each user (e.g. for legal or license reasons), then, for instance, regulate which workflow engines or versions can be executed.

I also have to admit that I did not understand the concept of the work order token. Maybe s.b. with more knowledge could comment on this. From a OAuth2 perspective the authorization server may communicate such information to the WES server (e.g. via the JWT), such that these kinds of permissions are then retrievable from the authorization server, if the client wants to know what it is allowed to do or not.

What do you think @patmagee?

uniqueg commented 1 year ago

@vinjana: To provide a more practical answer, I think this can be done relatively easily through clients. Say you have a web portal through which users trigger workflow runs. A WES instance could be firewalled off to only listen to requests from that web portal, and through the web portal you could make sure that only certain workflows could be run (for example, by providing a organization-wide TRS with appropriate access control). If you are interested in that, we can discuss this elsewhere, as we either have or are working on most of the components require for such a solution in the ELIXIR Cloud & AAI Driver Project.

vinjana commented 1 year ago

Some thoughts on your proposal, @uniqueg:

If the WES administrator wants to restrict to specific servers, I agree, that he/she may just add outbound firewall rules, to allow access to only e.g. some specific TRS server, or Github.

But filtering e.g. with full URLs, or even on user-name via a firewall? Would that not interfere e.g. with SSL encryption? I am not sure whether that is possible without restricting the use case, e.g. to force the user to host their own TRS or Git repository.

Finally, the proposal was less about the implementation of filtering, but more about the communication of what is filtered. It depends a bit on the use case: If you think of a very open infrastructure, with possibly many WES servers, that are organically instantiated by diverse organizations, I can imagine that communicating access restrictions via the ServiceInfo becomes relevant, independent of whether you use WES-code, or firewalls, or whatever.

uniqueg commented 1 year ago

Well, as I said, hosting a private TRS together with your web portal and firewalling off the WES to listen to only that portal would be a practical workaround covering some of what you describe. I am aware that this doesn't cover for all use cases. For that I don't know more than what I wrote previously, i.e., work order tokens may do the trick.