Refactor of original app to run TSO500 local app, this is now much faster (down from ~ 14 hours to 2 1/2 hours for a test run), and splits out the output files to distinct output fields for downstream automation.
Changes
switch running mode from sequential to scatter - gather (described on page 9 here, using dx sub jobs to run the per sample analysis, and gathering outputs and doing final analysis in parent job
create distinct output fields for bams, vcfs, combinedVariantOutput tsvs, TMB, MSI and metrics output files
tarring of all intermediate log files and stdout/stderr to speed up upload / reduce total files output
compressing of all intermediate genome VCFs which aren't routinely used (and should be compressed because its a VCF)
additional options such as -include_samples and -exclude_samples to control running of the app
TODO
decide if final output directory structure is suitable
decide if anymore output files need assigning to distinct output fields
decide if any other inputs needed to modify behaviour
Summary
Refactor of original app to run TSO500 local app, this is now much faster (down from ~ 14 hours to 2 1/2 hours for a test run), and splits out the output files to distinct output fields for downstream automation.
Changes
-include_samples
and-exclude_samples
to control running of the appTODO
Example job: https://platform.dnanexus.com/projects/GZY91Qj4QJYbJ71xJv6KVyXP/monitor/job/GZxK9Qj4QJYVGB3bpG7qZqBY Original 002 job for the above run: https://platform.dnanexus.com/projects/GZQQ9204bV2Xzq048x6fq28p/monitor/job/GZQQ92Q4bV2Xzq048x6fq28v
Fixes #4 #5 #7 #9
This change is