ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

Support and standard keyword for WES #173

Open jb-adams opened 2 years ago

jb-adams commented 2 years ago

Currently, WES supports standard keywords for CWL and WDL workflow languages. Can we add a new workflow_type keyword supporting Nextflow-based workflows, perhaps NEXTFLOW?

adeslatt commented 2 years ago

Perhaps we are thinking about this incorrectly. Jupyterlab notebooks can be translated from R, ipynb, etc. using the tool jupytext Kernels are specified when running jupyterlab notebooks using rendering tools such as papermill. We should say that a standard workflow specification could be articulated and stored in Common Workflow Language (as does the EU with their Go-FAIR plans and as stated on the Elixir site) but that we have translators that translate the CWL to Nextflow, WDL, Snakemake etc. Separating out concerns solves the problem and evolved best practices would be similar to the literate programming achieved with jupyterlab notebooks rendered in ipynb - allowing for the normal marked down section and then the execution cell.

What i suggest is that we have a language that is specified for storage. Most of these allow for processes to be specified, Nextflow and CWL, with inputs, outputs and the process specified with bash level scripting for the execution portion. That process should be a container where just that process is contained within it. This lends itself to continuous testing, etc.

Execution can then be specified easily and this lends itself to the automation we seek.

geoffjentry commented 2 years ago

Converting CWL to WDL and vice versa is at best imperfect. There's been some recent workwhich is exciting but I believe it only goes in one direction. My understanding is that Nextflow is moving in a direction that makes direct translation harder, not easier - and there's not been much work in that regard in the first place.

If someone wants to contribute a universal translator between these standards, then I agree w/ your viewpoint.

uniqueg commented 2 years ago

Just for reference:

PRs to include Nextflow and Snakemake in the corresponding TRS enum:

caylward-ilmn commented 2 years ago

I'm reading through the API Spec for the WES POST /runs endpoint, specifically the workflow_type parameter description:

The workflow_type is the type of workflow language and must be "CWL" or "WDL" currently (or another alternative supported by this WES instance).

It seems like this workflow_type parameter for the POST /runs endpoint is not an enum. I guess (as indicated in the above comment) this is really a change to the TRS spec to support "NFL" as a descriptor type?

In the WES GET /service-info endpoint, there is a workflow_type_versions attribute which returns "Available workflow types supported by a given instance of the service". I would think this provides the acceptable values to use in the workflow_type field of the POST /runs endpoint, leaving it up to the implementation to indicate available "workflow_types", rather than the WES spec.

patmagee commented 2 years ago

@caylward-ilmn this may be a case of imprecise language in the documentation. Higher up in the reference docs it also says:

(currently CWL or WDL formatted workflows, other types may be supported in the future)

Which seems to imply that CWL or WDL are the only supported languages currently

caylward-ilmn commented 2 years ago

@patmagee Sure - you're referring to the Executive Summary section (though it DOES say "other types may be supported in the future").

Regardless, that's a non-functional change to fix ancillary docs. Seemed like this issue was suggesting a change to the API specification? That I don't see...

patmagee commented 2 years ago

@caylward-ilmn that caveat (or another alternative supported by this WES instance) does seem to appear everywhere in the docs already, so it may just practically be treated as an enum, when in reality CWL and WDL are just strings that can be anything.

It seems there already is in the specification the flexibility to define any workflow language as "supported" through the workflow_type field. Because of that I do not think adding NEXTFLOW as a documented language changes the specification from the way it currently operates. Issues around input harmonization, and whether adding NEXTFLOW changes the semantics of the API are clearly already present by allowing any language that a WES instance supports

pditommaso commented 2 years ago

Worth mentioning nextflow is already implemented by Tower API, and it's expecting NFL as language type for consistency with TRS

https://tower.nf/openapi/index.html#post-/ga4gh/wes/v1/runs

patmagee commented 2 years ago

If NFL is already used in TRS I do no see any reason to add separate terminology with NEXTFLOW, and we should probably just use the NFL name

jb-adams commented 2 years ago

I agree, wasn't aware of NFL in TRS till now

uniqueg commented 2 years ago

If the docs/specs are to be changed to mention supported workflow languages explicitly, then I think it would be nice if it were the same set of languages supported in TRS. From v2.0.1 onward this will be:

See here.

I am not sure whether the WES specs should include such an enumeration at all, though.

patmagee commented 2 years ago

@uniqueg i think it definitely makes sense to keep language consistent between apis. NFL makes a lot of sense for this.

I am not sure what the process of getting additional languages is supported is. My guess is eventually WES will support snake make, and galaxy as well and the two lists would likely be similar.

uniqueg commented 2 years ago

Actually, I'm not quite sure what exactly it means for WES to support a workflow language - which is why I'm not sure whether languages should even be listed explicitly at all. First of all, I think that WES specs should be abstract enough to enable a wide range of engines to implement them and provide at least a basic feature set. But assuming there are cases where this does not hold, I think it's primarily the engine that would cause compatibility problems, not the language itself. There might be engines where an otherwise "supported language" is not a good match for WES. But I find it quite hard to imagine a case where the language itself would cause problems. As long as a workflow can be addressed by a URI, it should be able to be processed by a suitable engine, or not?

So what are the criteria for a language to be supported by WES? If you mean there should be implementations for a candidate language, then Snakemake should certainly be listed, as there are at least two implementations available that I know of:

Of course, these implementations cannot strictly comply with the specs if the language isn't listed.... ;-)

uniqueg commented 2 years ago

My suggestion would be to list the languages (or better yet the engines, or both) that have been demonstrated to play nice with WES in the documentation, not the specs. This would lower the barrier for workflow engine developers to implement the specs. I feel similarly about TRS, by the way. Let the developers themselves interpret the specs for their use case (language, engine) and decide whether the specs are a good match or not...

patmagee commented 2 years ago

@uniqueg I think that is a good Idea. I was also wondering if instead of WES explicitly stating the languages that it supports, to instead describe what general features a language should require to be a candidate for WES. Similar to your suggestion above this lowers the bar and allows off spec languages

We could add a section something like the following (this is not an actual suggestion on the wording btw):

WES can be used for any workflow language that  adheres to the following general principals
1. A workflow should accept inputs  
2. A single workflow should output a set of outputs
3. A workflow should not require an interactive session with the user but can be executed in a completely headless way
uniqueg commented 2 years ago

I like that! :) Though we will need to check which requirements WES actually really has from the RunRequest model and maybe other more hidden limitations.

Speaking of the RunRequest schema: any particular reason why the required property isn't actually being used to allow automatic validation?