broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

Look into adaptive pruning in GATK 4.2.0.0 #7232

Open GATKSupportTeam opened 3 years ago

GATKSupportTeam commented 3 years ago

Mutect2 Adaptive Pruning issue as discussed in GATK OH meeting. Here is the original post:

This request was created from a contribution made by Nabeel Ahmed on April 07, 2021 09:13 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360077647812-Why-do-a-clear-expected-variant-not-show-up-in-the-Mutect2-vcf-file

--

I am running Mutect2 on a sample in tumor-only mode. This sample has mutations introduced and known to be true positive calls. However, I am unable to detect some of these calls in the vcf file after Mutect2 is run that have very clear read support as seen in IGV. I have used the –bam-output option to show the output bam and in IGV, it shows that there is no assembly in this region and no mutation event was detected. I am showing the IGV screenshot for one of such calls (chr12:25398285).

I am using the latest version GATK 4.2.0.0 and the following is the full Mutect2 command from the log file

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.2.0.0-local.jar Mutect2 -R ../resources/hg19.fa -L ../resources/coding_regions.bed -I bam_files/sample1.bam --pon ../resources/pon.vcf.gz --germline-resource ../resources/af-only-gnomad.raw.sites.hg19.vcf.gz --bam-output sample1.mutect2_out.bam --recover-all-dangling-branches true -min-pruning 1 --min-dangling-branch-length 2 --debug --max-reads-per-alignment-start 0 --genotype-pon-sites True --f1r2-tar-gz vcf_files/f1r2.sample1.tar.gz -O vcf_files/unfiltered.sample1.vcf  

In the debug mode, the following log messages are generated for this region

08:01:26.086 INFO  Mutect2Engine - Assembling chr12:25398242-25398320 with 14298 reads:    (with overlap region = chr12:25398142-25398420)

I have another call with similar VAF that is detected in the vcf output(chr12:25380275).

chr12 25380275   .    T    G    .    .     AS_SB_TABLE=3911,5343|26,21;DP=9485;ECNT=1;MBQ=36,36;MFRL=0,0;MMQ=42,42;MPOS=18;POPAF=7.30;TLOD=53.53     GT:AD:AF:DP:F1R2:F2R1:SB   0/1:9254,47:4.970e-03:9301:5321,21:3867,26:3911,5343,26,21

The input and the output BAMs show this call with the variant.

In the logs, it shows the detection of an active region here:

08:01:23.642 INFO  Mutect2Engine - Assembling chr12:25380238-25380327 with 19912 reads:    (with overlap region = chr12:25380138-25380427)

08:01:24.119 INFO  EventMap - >> Events = EventMap{chr12:25380275-25380275 [T*, G],}

08:01:24.154 INFO  AssemblyResultSet - Trimming active region AssemblyRegion chr12:25380238-25380327 active?=true nReads=19912 with 2 haplotypes

08:01:24.154 INFO  AssemblyResultSet - Trimmed region to chr12:25380255-25380295 and reduced number of haplotypes from 2 to only 2

08:01:25.383 INFO  EventMap - >> Events = EventMap{chr12:25380275-25380275 [T*, G],}

I have tried troubleshooting with the steps stated in this [blog](/hc/en-us/articles/360043491652-When-HaplotypeCaller-and-Mutect2-do-not-call-an-expected-variant). However, it does not change the output vcf. I used the force-calling mode by giving the above call in an input vcf and the call did appear in the vcf file.

chr12 25398285   .    C    A    .    .     AS_SB_TABLE=4312,3630|14,8;DP=8096;ECNT=1;MBQ=36,36;MFRL=0,0;MMQ=42,42;MPOS=22;POPAF=7.30;TLOD=14.69     GT:AD:AF:DP:F1R2:F2R1:SB   0/1:7942,22:2.576e-03:7964:3609,8:4268,13:4312,3630,14,8

However, I cannot rely on force-calling mutations on a set of calls. I am unsure if I am missing out more calls that are not showing up. Are there any parameters I need to tune so that I do not miss calls like above?

(created from Zendesk ticket #136765)
gz#136765

nabeel-bioinfo commented 3 years ago

Thank you for taking a look into this. I followed recommendation of @gbrandt6 and reduced the --pruning-lod-threshold but this call is still unable to make it to the output of Mutect2. I tried different thresholds from 1.3, 0.7, 0.5 and even 0.1 but it did not lead to any difference in detecting this call.