broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

Newer versions of Mutect2 miss low-AF variants #7015

Open gabeng opened 3 years ago

gabeng commented 3 years ago

Bug Report

Affected tool(s) or class(es)

Mutect2

Affected version(s)

Description

I am evaluating Mutect2 variant calling performance in GiaB mixtures (target capture, no UMI, 2000x avg coverage). In particular, I am comparing 4.0.12.0 against 4.1.9.0 with default parameters. Below, I am providing data from a representative sample. 4.1.9.0 misses variants that 4.0.12.0 was able to call. When feeding a reference VCF with option --alleles the variants are detected with decent quality scores. It is unclear why 4.1.9.0 does not make these variant calls and if this could be changed by modifying input parameters. Unlike in this issue https://github.com/broadinstitute/gatk/issues/6724 the variants were not called with the option --force-active.

These are the variants that are only called by 4.1.9.0 when the reference VCF is fed as input:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample
2   25458546    .   C   T   .   .   AS_SB_TABLE=723,503|25,14;DP=1302;ECNT=1;MBQ=20,20;MFRL=189,190;MMQ=60,60;MPOS=36;POPAF=7.3;TLOD=61.58  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1226,39:0.033:1265:576,19:554,19:723,503,25,14
4   55152040    .   C   T   .   .   AS_SB_TABLE=1102,1078|15,13;DP=2349;ECNT=2;MBQ=20,20;MFRL=180,164;MMQ=60,60;MPOS=35;POPAF=7.3;TLOD=31.85    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:2180,28:0.012:2208:1003,16:1104,10:1102,1078,15,13
5   170833472   .   AAT A   .   .   AS_SB_TABLE=201,501|7,17;DP=750;ECNT=1;MBQ=20,26;MFRL=203,209;MMQ=60,60;MPOS=20;POPAF=7.3;RPA=2,1;RU=AT;STR;TLOD=45.4   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:702,24:0.035:726:329,12:311,12:201,501,7,17
7   101844851   .   A   G   .   .   AS_SB_TABLE=1022,1178|25,25;DP=2406;ECNT=1;MBQ=20,20;MFRL=189,198;MMQ=60,60;MPOS=46;POPAF=7.3;TLOD=65.52    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:2200,50:0.021:2250:854,29:906,19:1022,1178,25,25
7   101916798   .   C   A   .   .   AS_SB_TABLE=91,916|1,37;DP=1060;ECNT=1;MBQ=32,32;MFRL=213,195;MMQ=60,60;MPOS=26;POPAF=7.3;TLOD=54.92    GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1007,38:0.033:1045:438,17:511,18:91,916,1,37
7   148506396   .   A   C   .   .   AS_SB_TABLE=990,908|18,18;DP=1981;ECNT=1;MBQ=20,20;MFRL=193,203;MMQ=60,60;MPOS=35;POPAF=7.3;TLOD=48.03  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1898,36:0.019:1934:917,15:842,17:990,908,18,18
11  108175462   .   G   A   .   .   AS_SB_TABLE=682,593|10,8;DP=1354;ECNT=2;MBQ=20,20;MFRL=191,177;MMQ=60,60;MPOS=36;POPAF=7.3;TLOD=16.27   GT:AD:AF:DP:F1R2:F2R1:PGT:PID:PS:SB 1|0:1275,18:0.011:1293:597,8:622,9:1|0:108175394_T_C:108175394:682,593,10,8
12  12037318    .   C   G   .   .   AS_SB_TABLE=686,588|12,7;DP=1365;ECNT=1;MBQ=20,20;MFRL=188,214;MMQ=60,60;MPOS=45;POPAF=7.3;TLOD=18.93   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1274,19:0.014:1293:599,5:603,10:686,588,12,7
15  66679819    .   G   C   .   .   AS_SB_TABLE=456,686|11,12;DP=1220;ECNT=1;MBQ=20,20;MFRL=201,184;MMQ=60,60;MPOS=49;POPAF=7.3;TLOD=30.97  GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1142,23:0.022:1165:515,15:542,8:456,686,11,12
18  42532923    .   T   C   .   .   AS_SB_TABLE=951,1013|14,19;DP=2062;ECNT=1;MBQ=20,32;MFRL=183,217;MMQ=60,60;MPOS=43;POPAF=7.3;TLOD=48.91 GT:AD:AF:DP:F1R2:F2R1:SB    0/1:1964,33:0.019:1997:909,15:910,15:951,1013,14,19
20  31019360    .   AT  A   .   .   AS_SB_TABLE=1027,1027|31,25;DP=2202;ECNT=1;MBQ=20,20;MFRL=187,169;MMQ=60,60;MPOS=45;POPAF=7.3;RPA=7,6;RU=T;STR;TLOD=17.82   GT:AD:AF:DP:F1R2:F2R1:SB    0/1:2054,56:0.021:2110:965,37:987,19:1027,1027,31,25

This is how 4.0.12.0 calls these variants with standard settings:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample
2   25458546    .   C   T   .   .   DP=880;ECNT=1;MBQ=36,36;MFRL=206,204;MMQ=60,60;MPOS=30;POPAF=7.3;TLOD=46.19 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:821,25:0.031:846:418,12:403,13:0.02,0.03,0.03:0.003181,0.001768,0.995
4   55152040    .   C   T   .   .   DP=1582;ECNT=2;MBQ=36,36;MFRL=192,175;MMQ=60,60;MPOS=33;POPAF=7.3;TLOD=25.15    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:1425,18:0.012:1443:688,9:737,9:0.01,0.01,0.012:0.001817,0.000804,0.997
5   170833472   .   AAT A   .   .   DP=596;ECNT=1;MBQ=36,36;MFRL=211,219;MMQ=60,60;MPOS=21;POPAF=7.3;RPA=2,1;RU=AT;STR;TLOD=42.8    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:561,19:0.034:580:288,10:273,9:0.03,0.03,0.033:0.006381,0.002272,0.991
7   101844851   .   A   G   .   .   DP=1820;ECNT=1;MBQ=36,36;MFRL=204,203;MMQ=60,60;MPOS=46;POPAF=7.3;TLOD=60.08    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:1630,38:0.022:1668:799,23:831,15:0.02,0.02,0.023:0.003057,0.0008676,0.996
7   101916798   .   C   A   .   .   DP=1016;ECNT=1;MBQ=36,36;MFRL=212,202;MMQ=60,60;MPOS=27;POPAF=7.3;TLOD=60.48    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:964,36:0.035:1000:440,19:524,17:0.04,0.02,0.036:0.003131,0.007434,0.989
7   148506396   .   A   C   .   .   DP=1399;ECNT=1;MBQ=36,36;MFRL=209,211;MMQ=60,60;MPOS=34;POPAF=7.3;TLOD=42.57    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:1327,27:0.019:1354:672,12:655,15:0.02,0.02,0.02:0.0007239,0.005058,0.994
11  108175462   .   G   A   .   .   DP=972;ECNT=1;MBQ=36,36;MFRL=209,206;MMQ=60,60;MPOS=37;POPAF=7.3;TLOD=15.55 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:898,12:0.013:910:446,6:452,6:0.01,0.01,0.013:0.002966,0.0009292,0.996
12  12037318    .   C   G   .   .   DP=975;ECNT=1;MBQ=36,36;MFRL=205,218;MMQ=60,60;MPOS=48;POPAF=7.3;TLOD=15.41 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:893,12:0.013:905:438,4:455,8:0.01,0.01,0.013:0.004146,0.0007942,0.995
15  66679819    .   G   C   .   .   DP=870;ECNT=1;MBQ=36,36;MFRL=215,211;MMQ=60,60;MPOS=52;POPAF=7.3;TLOD=25.62 GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:797,17:0.022:814:384,11:413,6:0.02,0.02,0.021:0.004622,0.001165,0.994
18  42532923    .   T   C   .   .   DP=1402;ECNT=1;MBQ=36,36;MFRL=197,222;MMQ=60,60;MPOS=51;POPAF=7.3;TLOD=41.21    GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:1312,25:0.019:1337:681,13:631,12:0.02,0.01,0.019:0.0009167,0.002842,0.996
20  31019360    .   AT  A   .   .   DP=1530;ECNT=1;MBQ=36,36;MFRL=198,199;MMQ=60,60;MPOS=44;POPAF=7.3;RPA=7,6;RU=T;STR;TLOD=30.65   GT:AD:AF:DP:F1R2:F2R1:SAAF:SAPP 0/1:1424,34:0.022:1458:673,20:751,14:0.02,0.02,0.023:0.003063,0.0009654,0.996

Steps to reproduce

# Call without --alleles option does not produce the variants.
gatk Mutect2 --reference GRCh37.fa --read-validation-stringency LENIENT -I subset_properheader.bam -L enrichment.bed --interval-set-rule INTERSECTION -O unfiltered_noalleles.vcf.gz
# Call with --alleles option produces the variants with presumably high quality scores
gatk Mutect2 --reference GRCh37.fa --read-validation-stringency LENIENT -I subset_properheader.bam -L enrichment.bed --interval-set-rule INTERSECTION -O unfiltered_alleles.vcf.gz --alleles truth_small_variants_NA12878-NA24385-mix_sorted.vcf.gz

The BAM, BED and output VCF files are available for download here.

droazen commented 3 years ago

@davidbenjamin @fleharty ^^^