ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Add Workflow engine and version to RunWorkflow parameters #181

Closed vsmalladi closed 1 year ago

vsmalladi commented 2 years ago

As workflow engines evolve there will be a difference between workflow language definition vs the version of the runner. I am suggesting we add two new required parameters to include in the WES spec to support running workflows that are actually supported by and tested by the developers of the workflow in question.

If a workflow is on Dockstore and has only been tested on Cromwell version 79, you should be able to run on Cromwell and can test with different Cromwell versions. WDL can be made with the new 1.1 spect, but Cromwell won't support that and thus WES with only Cromwell and no MiniWDL support wouldn't be able to run the 1.1 spec version. With Nextflow there is a DSL-1/DSL-2 of the langauge that has different support for version of the Nextflow runner.

Overall I think this will provide the user more control on what runners and versions they can run their workflow so they can be confident that their workflow runs.

Suggesting to add these:

  The `workflow_runner` is the runner of workflow such as  "Cromwell" or "Nextflow" or "MiniWDL".
  The `workflow_runner_version` is the version of the workflow runner submitted and must be one supported by this WES instance.
patmagee commented 2 years ago

@vsmalladi thanks for suggesting this! I agree, that as we see broader adoption of WES we will run into this issue more and more. Especially when we have multi-engine backends behind a singular WES entry point.

I have a few suggestions, so interested to hear what you think:

vsmalladi commented 2 years ago

@patmagee Here are some of my thoughts

uniqueg commented 2 years ago

I agree with the need for this, and I also agree with defaults being broadcast so that, e.g., one can also pull an appropriate WES from a service registry ("give me a WES for engine X version Y"). Semantically and for consistency, I tend to not favor including these in the workflow_enginge_parameters (those are rather things that you would pass as CLI args to a given workflow engine of a given version).

Couple of thoughts here:

I therefore think that

note that there's no workflow type version for SMK

Tagging some more people: @denis-yuen @svedziok @vinjana

vsmalladi commented 2 years ago

Shouldn't the engine version be able to error correctly based on the workflow_type version and repo that is provided?

So if you say wdl: 1.1 and cromwell 79 then here are the errors I see:

  1. WES API fails if wdl 1.1 and cromwell 79 are not supported
  2. If combination is not support workflow engine will pass that error back to WES client

Otherwise keeping this combination will be very onerous on the implementer

svedziok commented 2 years ago

I agree that support for multiple engine versions on the WES side is necessary for reproducibility. However, documenting all valid combinations of language and engine versions can become very complex for the WES implementer/provider. From the WES provider side, it would be much easier to support only specific engine versions (for nextflow, cromwell, cwltool, toil, etc.) and give this information back to the user. The engine should be able to handle the language version of a specific run and return an error if a workflow language version is not supported by the chosen engine.

uniqueg commented 2 years ago

My last point was motivated by the idea that it would be rather terrible user experience if if a client first needed to test WES instance by instance to see if there's a WES that supports running a given workflow, especially if it involves user interaction. But thinking about it, I agree with you @vsmalladi and @vsmalladi, returning an error is probably not too terrible, because it is arguably a fair requirement on users to check themselves if a given workflow engine (version) is capable of running a workflow of a specific type and version.

Perhaps it's helpful to play this through a bit and summarize the current state as well as the one proposed in #182:

  1. Presumably, in most cases, users just want their workflows to run. For that they'd need to know the supported workflow types and, if applicable, workflow type versions. For that, the existing schema in the service info, a composite of ServiceInfo and WorkflowTypeVersion suffices, although the property names are rather confusing:

    workflow_type_versions:
       CWL:
           workflow_type_version: ["1.0", "1.1", "1.2"]
       SMK:
           workflow_type_version: [""]

    Now a client knows that they can send most SMK and CWL workflows to the service, but not NFL workflows.

  2. Then, sometimes, a user/client would like a specific engine or engine version to be used. Any WES would only have a definite and limited number of workflow engines / workflow engine versions that they wrap. So this could be easily broadcast separately, as is already partly provided for in the current specs:

    workflow_engine_versions:
       toil: "3.1.8"
       cwltool: "2.0.20200126090152"
       snakemake: "6.15.5"

    Now a user knows that their brandnew Snakemake workflow that requires version >7.0 features wouldn't run on that particular WES. For CWL it becomes more complex in this example. I might know from workflow_type_versions that CWL 1.2 workflows are supported by this WES, but there are several engines for CWL. If I wanted to use cwltool to run a 1.2 workflow, I would be out of luck and receive an error (the listed cwltool version doesn't support 1.2) - which is acceptable if it's my responsibility to verify that the supported cwltool version is able to run my workflow. Similarly, if a user/client provided a nonsensical combination like snakemake for the workflow engine when the workflow type is CWL, they arguably deserve their error well.

    What the current specs do not provide for, is that a WES may support multiple versions of the same engine (because the additionalProperties of the workflow_engine_versions property are of type string, unlike in the WorkflowTypeVersion schema. #182 proposes to rectify this by replacing the string type with a reference to an object WorkflowEngineVersion with a single property workflow_engine_version of type array, analogous to how workflow types and versions are broadcast:

    workflow_engine_versions:
       toil:
           workflow_engine_version: ["3.1.8", "5.7.1"]
       cwltool:
           workflow_engine_version: ["2.0.20200126090152"]
       snakemake:
           workflow_engine_version: ["6.15.5"]

So in terms of broadcasting workflow types and type versions, the proposed solution covers all the angles I can think of. But one thing might still be worth discussing:

patmagee commented 2 years ago

This is a really great discussion and It looks like we ended up in pretty good place. FWIW, I also agree that trying to map the supported workflow types and version to every single engine would be very tedious for the engine implementer, and likely would be tedious for the user as well. To that end, I like the initial suggestion from @vsmalladi with the tweak you suggested @uniqueg to make it more inline with the workflow_type_versions .

In regards to the range selectors, I do not think that it really makes sense for the current use case. I doubt an implementer would allow for a dynamic range of engines as opposed to offering several discrete versions of a single engine. For example If you specified a range of >=1.0 <=1.2 would a user be able to select ANY version within that range?

uniqueg commented 2 years ago

I was thinking of ranges more for specifying supported workflow type versions (rather than engine versions).

The workflow types I am familiar with only have a couple of versions, but there may be some with frequent updates and where engines are backward compatible with respect to the workflow type versions they support. In such cases it would be tedious to enumerate every one of them.

However, it's the definition of feature creep to try to address problems that nobody ever ran into. If we ever get a complaint we can still reevaluate.

uniqueg commented 2 years ago

I was thinking of ranges more for specifying supported workflow type versions (rather than engine versions).

The workflow types I am familiar with only have a couple of versions, but there may be some with frequent updates and where engines are backward compatible with respect to the workflow type versions they support. In such cases it would be tedious to enumerate every one of them.

However, it's the definition of feature creep to try to address problems that nobody ever ran into. If we ever get a complaint we can still reevaluate.

vsmalladi commented 2 years ago

Ya i think it worth making a new issue for this range discussion. But this also goes into discussion about how many legacy versions of an engine are support and for how long.

uniqueg commented 2 years ago

I don't even think it's worth opening an issue. We can leave that up to the first person running in to that problem :)