v2.0.0 refactor - Githubissues

Summary

Refactor of original app to run TSO500 local app, this is now much faster (down from ~ 14 hours to 2 1/2 hours for a test run), and splits out the output files to distinct output fields for downstream automation.

Changes

switch running mode from sequential to scatter - gather (described on page 9 here, using dx sub jobs to run the per sample analysis, and gathering outputs and doing final analysis in parent job
create distinct output fields for bams, vcfs, combinedVariantOutput tsvs, TMB, MSI and metrics output files
tarring of all intermediate log files and stdout/stderr to speed up upload / reduce total files output
compressing of all intermediate genome VCFs which aren't routinely used (and should be compressed because its a VCF)
additional options such as -include_samples and -exclude_samples to control running of the app

TODO

decide if final output directory structure is suitable
decide if anymore output files need assigning to distinct output fields
decide if any other inputs needed to modify behaviour

Example job: https://platform.dnanexus.com/projects/GZY91Qj4QJYbJ71xJv6KVyXP/monitor/job/GZxK9Qj4QJYVGB3bpG7qZqBY Original 002 job for the above run: https://platform.dnanexus.com/projects/GZQQ9204bV2Xzq048x6fq28p/monitor/job/GZQQ92Q4bV2Xzq048x6fq28v

Fixes #4 #5 #7 #9

This change is

eastgenomics / eggd_tso500

v2.0.0 refactor #10

Summary

Changes

TODO