Workflow Definition Language (WDL) and Common Workflow Language (CWL) are high-level languages for describing how to run a sequence of programs to perform a data analysis task. A workflow consists of a series of steps that are connected by input/output dependencies.
CWL is the product of community-based open source standards process, and workflows written in CWL are portable across a number of different software platforms (e.g. Arvados, Toil, CWL-Airflow, Seven Bridges). WDL is also open source, but based largely around a single implementation (Cromwell), however some workflows that are important to the bioinformatics community are only maintained in WDL.
The goal of this project is to develop a translator that takes a WDL workflow and produces an equivalent workflow in CWL. When executed with the same input, the translated workflow should produce equivalent results to the original workflow. An ideal demonstration of capability would be to translate the Broad Institute WDL Analysis Research Pipelines (WARP) Whole Genome Germline Single Sample workflow, run it on a scale-out, production CWL runner (such as Arvados or Toil), and show that the results are equivalent.
More background reading on CWL:
This project uses the CWL parser and objects from cwl_utils.parser.cwl_v1_2
miniwdl is used for WDL parsing,
and while we target OpenWDL 1.1, earlier versions of (Open)WDL seem to work
thanks to the flexibility of the miniwdl
parser.
For some discussion comparing the two languages (mainly from the perspective of translating in the other direction, CWL to WDL), see this document:
https://github.com/dnanexus/dxCompiler/blob/main/doc/CWL_v1.2.0_to_WDL_v1.md
Python 3.7+
These instructions assume a Linux / macOS operating system.
git clone https://github.com/common-workflow-lab/wdl-cwl-translator/
cd wdl-cwl-translator
python3 -m venv env
source env/bin/activate
pip install -U pip setuptools wheel
pip install -e .
wdl2cwl path_to_wdl_file
To output the CWL version to your terminal/stdout.
wdl2cwl path_to_workflow.wdl --output path_to_new_workflow.cwl
WDL features not yet supported
WDL types not yet supported
OpenWDL 1.1 standard library functions to be implemented
Many of the above are straightforward to implement, but we haven't needed them yet. So if you are unable to translate a particular WDL document due to lackof a standard library function, please open an issue and share your example!
DockerRequirement
has no support for dynamic
specifications, only fixed values. If a WDL task has a runtime.docker
that
references an input with a default value, then wdl2cwl
does try to copy that
default value to the CWL DockerRequirement.dockerPull
.If changing the software container is needed, there are several workarounds:
cwltool
) support overriding
requirements at any level at run time. See
https://github.com/common-workflow-language/cwltool#overriding-workflow-requirements-at-load-timeDockerRequirement
in hints
by specifying your own container
at the CWL workflow step level under requirements
(Open)WDL assumes that users will configure localization by placing
input files in the same directory. Descriptions that require this will need
modification before conversion to CWL, as CWL has explicit constructs for
achieving localization (secondaryFiles
, InitialWorkDirRequirement
, and/or
explicit staging).
See this example
for one method using explicit staging of input files in the command
block to
achieve the localization required by the tool(s) being called.
If you are converting a WDL workflow to the CWL format and the original WDL document is the "source of truth", then one should avoid making manual changes to the CWL as you will need to maintain those changes as the source WDL document(s) changes.
Otherwise, for those users looking to convert from WDL to CWL and then continue to modify the CWL directly, then we have the following advice:
Consider swapping the wdl2cwl
translation of the WDL tasks for
community maintained CWL descriptions for popular tools
when possible. Follow the instructions on usage
and update the run
line to refer to a local path or a "raw" GitHub URL of the
community-maintained tool description. You may need to adjust a few input names to
match. Of course, we are happy to receive your enhancements and additional CWL
bio* tool descriptions!
For the resulting CWL Workflow
and any CWL CommandLineTool
s not swapped for
idiomatic CWL descriptions, consider using the following CWL features absent in WDL
Workflow
TipsCommandLineTool
TipssecondaryFiles
instead of implicit file co-localizaton for when you
have a file and its index(es).format
specifiers to input and output File
s and arrays of File
s both at the
Workflow
and CommandLineTool
levels. This helps improve the type checking
of the workflow and anyone wanting to re-use or adapt the individual CommandLineTool
s.minCores
in ResourceRequirement, consider setting the maxCores
if the tool is known to not benefit from additional cores after a certain amount.$(runtime.cores)
to pass
to your tools.$(runtime.outdir)
.my_tool < input_file
)?
You can change the input to be type: stdin
instead of type: File
and drop the < input_file
as a shortcut.DockerRequirement
via
SoftwareRequirement
.
This makes for good documentation, helps give credit to the authors of the tool(s),
and makes it easier for those who want to run with local software, conda packages,
and other non-containerized environments.export FOO=bar
) present in the script.bash
to an
EnvVarRequirement
Be careful, if the script.bash
runs many commands and the environment variables
are not set at the beginning, that may be due to them not being appropriate for
all the commands; so test to confirm that they are safe to move to an EnvVarRequirement
and if you aren't sure, leave them there.
Per-tool invocations with environment variables like FOO=bar name_of_tool.pl --option
are also a candidate if (1) there are no other tools invoked or (2) they all
have the same environment variables set or (3) they other tools ignore the
environment variables.command
sections create output directories and perform other
"housekeeping" that is not necessary in CWL, like symlinking files to change
names or otherwise arrange the input files. Output directories that themselves
don't become a Directory
type are likely removable. If a
specific arrangement of inputs files is needed,
or additional files need to created dynamically,
then consider using InitialWorkDirRequirement
.command
sections include copying input files to obtain writable versions.
This can be quite slow on many systems, and from a CWL perspective it is better
to use InitialWorkDirRequirement
to achieve the same results by marking those inputs as being writable: true
.script.bash
(which comes from the WDL command
section) meets the
following criteria, then consider removing it (and the InitialWorkDirRequirement
if otherwise unused) in favor of directly calling your tool using baseCommand
with the name of the executable and any static command line arguments
and arguments
with the remaining mix of dynamic and static command line arguments.
bash
features like for
loops and if
statements;
which
means there are multiple commands on a single line.stdin
, stdout
, and stderr
as need be.|
.make install-dep
make test # just the unit tests
make help # to list major makefile targets
make diff_pydocstyle_report # run a diff to show how much changes where made in the docstyle
tox # all the code checks
tox -l # list of all configured tox environments
tox -e py39-pydocstyle # perform only pydocstyle tests (py39 is the version of the python interpreter you have installed)
python wdl2cwl/main.py path_to_wdl_file -o specified_location
.wdl2cwl/tests/wdl_files
and the resultant CWL file to wdl2cwl/tests/cwl_files
.
Include the licence and the original location of the WDL file as a comment at the beginning of the document. wdl2cwl/tests/test_cwl.py
as an argument under the
@pytest.mark.parametrize()
function.tox
, and fix as many issue as you can on your own. make format
will fix many things for you!