Add genotyping and filtering option to single-sample workflow

ldgauthier commented 6 years ago

I have a version without imports that works. This version won't validate with womtool, probably because of the imports: ERROR: Cannot find reference to 'CheckContamination' for member access 'CheckContamination.contamination' (line 190, col 47):

The accuracy for the synthetic diploid sample (https://doi.org/10.1101/223297) was compared with a ~800 sample callset from production the sensitivity is very close for SNPs (slightly lower), better sensitivity for indels, and more FPs for SNPs and indels.

97.3% sensitivity for SNPs 99.4% precision for SNPs 65.9% sensitivity for indels* 98.7% precision for indels -- 3.8FP/Mb

*This dataset excludes 1bp indels, which are the most common and also the easiest. It also includes some very large events from PacBio not possible to call with Illumina.

SNP sensitivity is on par with that reported in the SynDip paper. Indel sensitivity is lower.

ldgauthier commented 6 years ago

Latest update is working with and without genotyping and filtering. I effectively overrode some of the existing tasks, which is maybe not good WDL style, but I needed new arguments.

For the record, the directory structure of the imports is a big pain for users (like me) running Cromwell in server mode: https://gatkforums.broadinstitute.org/gatk/discussion/comment/46211 It would be great if all the imports were inside some parent folder that could be easily zipped to submit the subworkflows.

bshifaw commented 6 years ago

gatk-workflows / five-dollar-genome-analysis-pipeline

Add genotyping and filtering option to single-sample workflow #2