epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 3 forks source link

Implement new mutect2 variant calling strategy #195

Closed syansanofi closed 3 years ago

syansanofi commented 3 years ago

Issue

Update Mutect2 variant calling to include latest gatk (4.2.0.0) recommended best practices.

Approach

Related to this change, additional germline references are required (many are seen in mutect1 list, however not all are listed)

Important Points

syansanofi commented 3 years ago

Mutect2 template

https://github.com/epam/fonda/blob/4a651caa0ab4bdb4ff92516d2294331c9723f134/src/main/resources/templates/mutect2_template.txt#L2

This line should probably be changed to the following:

gatk --java-options $javaOptions Mutect2 -I $controlBam -I $bam -normal $controlSampleName -R $refGenome -L $bed -pon $panelOfNormal --germline-resource $germlineResource --bam-output $bamout --f1r2-tar-gz ${f1r2.tar.gz} -O $vcf

Refs https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2

syansanofi commented 3 years ago

PileupSummaries

New template for PileupSummaries tool. This tool should produce pileup tables used in CalculateContamination tool.

gatk --java-options $javaOptions GetPileupSummaries -I $bam -L $bed -R $refGenome -V $contaminationVCF --sequence-dictionary $sequenceDictionary --interval-set-rule INTERSECTION -O $tumorPileupTable

syansanofi commented 3 years ago

CalculateContamination

New template for CalculateContamination tool. This tool is a spiritual replacement for ContEst in older gatk versions.

gatk --java-options $javaOptions CalculateContamination -I $tumorPileupTable -L $bed --tumor-segmentation $segmentsTable -O $contaminationTable

syansanofi commented 3 years ago

SortSam for bamout

SortSam applied to bamout of mutect2 for later stages.

gatk --java-options $javaOptions SortSam -I $bamout -O $bamoutSorted -SO coordinate

syansanofi commented 3 years ago

LearnReadOrientationModel

New template for LearnReadOrientationModel.

gatk --java-options $javaOptions LearnReadOrientationModel -I ${f1r2.tar.gz} -O ${artifactsPriors.tar.gz}

syansanofi commented 3 years ago

FilterMutectCalls

New template for FilterMutectCalls.

gatk --java-options $javaOptions FilterMutectCalls -R $refGenome -V $vcf -O $filteredVCF --contamination-table $contaminationTable --tumor-segmentation $segmentsTable --ob-priors ${artifactsPriors.tar.gz} -stats $vcfStats --filtering-stats $filteringStats

syansanofi commented 3 years ago

FilterAlignmentArtifacts

New template for FilterAlignmentArtifacts. Further filtering after FilterMutectCalls.

gatk --java-options $javaOptions FilterAlignmentArtifacts -I $bamoutSorted -V $filteredVCF -O $filteredVCFartifacts -R $refGenome --bwa-mem-index-image $bwaImage`