broadinstitute / viral-ngs

Viral genomics analysis pipelines
Other
189 stars 67 forks source link

output viral-ngs version as string in WDL workflows #928

Closed tomkinsc closed 5 years ago

tomkinsc commented 5 years ago

This calls reports.py --version in a new WDL task at the end of each workflow so the viral-ngs version is reported among workflow outputs as a WDL string. It would be more efficient to call reports.py --version within all of the existing tasks to avoid the extra startup overhead of a new task call, but that would add quite a bit of repetition among all of the tasks.

notestaff commented 5 years ago

For workflows run on DNAnexus, the version can be baked into the workflow as a constant, the way the docker tag is. It is also currently possible to determine the version from 'dx describe analysis-xxxxx' output. It might also make sense to attach the viral-ngs version to the created workflows and apps as DNAnexus properties, and/or to create global workflows which are explicitly versioned.

One other option is to use git smudge filters to bake the git hash as a constant into the WDL files. Not sure if dxWDL would be smart enough to avoid spinning up an extra job in that case.

dpark01 commented 5 years ago

Yeah been thinking about this a little... agree with the need to have this version more clearly stamped in a way we can access. A few thoughts:

Just a note that I still often ponder a future in which we don't need to have the exact same docker image for each WDL task (we could then strip the images down quite a bit since they don't have to be so monolithic), but I'm not sure if that future is compatible with retaining compatibility with Snakemake... anyway, a bit off-topic.

tomkinsc commented 5 years ago

Switching to containers may not break Snakemake after all: https://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html#singularity

notestaff commented 5 years ago

Besides slimming down the images, per-task Docker images would enable much more reuse of past results when re-running old analyses with newer viral-ngs versions. E.g. assemble_denovo_with_deplete could reuse the deplete part if only the assemble code changed.

tomkinsc commented 5 years ago

Thanks for the review, everyone. The latest commit switches to a sed-replaced string replacement per-task. Here's an example CI execution of a workflow, with the output string shown. There's some redundancy since each task outputs the same value, but that would allow us to mix-and-match task versions in workflows while keeping track of their source versions.