dnanexus / dxWDL

Workflow Description Language compiler for the DNAnexus platform
Apache License 2.0
40 stars 17 forks source link
wdl workflow-description-language

Note: dxWDL is now in maintaince mode. It will continue to receive bug fixes, but will have no new features. All new development is occurring in the dxCompiler repository.

dxWDL takes a pipeline written in the Workflow Description Language (WDL) and compiles it to an equivalent workflow on the DNAnexus platform. WDL draft-2, version 1.0, and the development version are supported. Note that calls with missing arguments have limited support.

A high level list of changes between draft-2 and version 1.0 is provided here.

Setup

Prerequisites: DNAnexus platform account, dx-toolkit, java 8+, python 2.7/3.x.

Make sure you've installed the dx-toolkit CLI, and initialized it with dx login. Download the latest compiler jar file from the releases page.

Example workflow

The bam_chrom_counter workflow is written in WDL. Task slice_bam splits a bam file into an array of sub-files. Task count_bam counts the number of alignments on a bam file. The workflow takes an input bam file, calls slice_bam to split it into chromosomes, and calls count_bam in parallel on each chromosome. The results comprise a bam index file, and an array with the number of reads per chromosome.

workflow bam_chrom_counter {
    File bam

    call slice_bam {
        input : bam = bam
    }
    scatter (slice in slice_bam.slices) {
        call count_bam {
            input: bam = slice
        }
    }
    output {
        slice_bam.bai
        count_bam.count
    }
}

task slice_bam {
    File bam
    Int num_chrom = 22
    command <<<
    set -ex
    samtools index ${bam}
    mkdir slices/
    for i in `seq ${num_chrom}`; do
        samtools view -b ${bam} -o slices/$i.bam $i
    done
    >>>
    runtime {
        docker: "quay.io/ucsc_cgl/samtools"
    }
    output {
        File bai = "${bam}.bai"
        Array[File] slices = glob("slices/*.bam")
    }
}

task count_bam {
    File bam
    command <<<
        samtools view -c ${bam}
    >>>
    runtime {
        docker: "quay.io/ucsc_cgl/samtools"
    }
    output {
        Int count = read_int(stdout())
    }
}

From the command line, we can compile the workflow to the DNAnexus platform using the dxWDL jar file.

$ java -jar dxWDL-0.59.jar compile bam_chrom_counter.wdl -project project-xxxx

This compiles the source WDL file to several platform objects.

These objects are all created in the current dx project and folder. The generated workflow can be run using dx run. For example:

dx run bam_chrom_counter -i0.file=file-xxxx

At runtime this looks like this: this

Strict syntax

One of the compiler phases takes a workflow apart, and extracts standalone tasks and sub-workflows. This requires a lexical analysis on the WDL program. It currently uses a simple regular expression to detect task/workflow start and end. This means that a task has to adhere to the following rules:

  1. no extra text is allows after the final closing bracket
  2. within the task body, closing brackets may not start at the beginning of a line.

Here is an example to avoid:

task foo {
input {
    File ref
}
command {
    ls -lh ~{ref}
}
}

It should be written like this:

task foo {
    input {
        File ref
    }
    command {
        ls -lh ~{ref}
    }
}

Additional information

Contributing to dxWDL

See development for more information on how to set up your development environment to contribute to dxWDL and how to test your updates.

Contributions

This software is a community effort! You can browse any of the contributions, that are not a part of dxWDL main source codebase, below in our contrib folder, and add your own (see Contributing to dxWDL).

Issues and feature requests

Let us know if you would like to contribute, request a feature, or report a bug.