Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
357 stars 103 forks source link

How to deal with some controversial somatic and germline variants #129

Open qindan2008 opened 5 years ago

qindan2008 commented 5 years ago

For normal-tumor sample pair, when I run with somatic mode, I got a somatic variant in which normal variant allele frequency was 0.0526.

1 241680511 . T TA . PASS SOMATIC;QSI=99;TQSI=1;NT=ref;QSI_NT=76;TQSI_NT=2;SGT=ref->het;MQ=60.00;MQ0=0;RU=A;RC=3;IC=4;IHP=3;SomaticEVS=9.50;EVSF=76,0.070186,0.80947,4,3,3,1,1.8525,2.7668,-4.1217,0.052632,0.83721,0.48649,0.13131,0.10811,0.010989,0.070186,-0.49174,0.49174 DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50 37:37:18,21:1,2:18,16:37.15:4.78:0.00:0.11 98:98:14,14:72,75:13,11:91.30:1.07:0.00:0.01

when I run with paired normal-tumor germline mode, the NORMAL in this site was flagged "PASS" by strelka2.

1 241680511 . T TA 1361 NoPassedVariantGTs CIGAR=1M1I;RU=A;REFREP=3;IDREP=4;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 0/0:18:18:37:20,2:10,2:10,0:PASS:0,21,340 0/1:11:11:98:14,74:5,35:9,39:LowGQX:999,0,8

But when I run strelka2 with just a single normal germline mode, I got no variants in this site.

I tried to use mutect2 and HaplotypeCaller to further confirmed whether the site is somatic or germline. Mutect2 returned no variants in this site. HaplotypeCaller showed variant allele frequency in normal was 0.4324324, which was a germline variant.

1 241680511 . T TA 418.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.828;DP=42;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=11.31;ReadPosRankSum=-0.675;SOR=0.610 GT:AD:DP:GQ:PL 0/1:21,16:37:99:426,0,612

So, I was confused how to determined whether some variants like this were somatic or germline? Do you have any good suggestions?

Thank you very much.

sangtaekim commented 5 years ago

Can you clarify what is the "normal-tumor germline mode"? Did you run Strelka's multi-sample calling and the first sample is normal? Then, Strelka germline reported PASS for "0/0" which matches your "single normal germline mode" result.

Strelka's scoring model is designed to tolerate certain contamination in the normal sample by the tumor which is common for liquid or late-stage tumor data. AFAIK, Mutect2 does not yet have this capability. This is one of our main claims in our paper. This is a good example of that a low AF in the normal is tolerated and still called as somatic due to high AF in the tumor.

qindan2008 commented 5 years ago

yes, "normal-tumor germline mode" referred to Strelka2's multi-sample germline calling mode and the first sample is normal.

  1. The multi-sample germline calling result was:

1 241680511 . T TA 1361 NoPassedVariantGTs CIGAR=1M1I;RU=A;REFREP=3;IDREP=4;MQ=60 GT:GQ:GQX:DPI:AD:ADF:ADR:FT:PL 0/0:18:18:37:20,2:10,2:10,0:PASS:0,21,340 0/1:11:11:98:14,74:5,35:9,39:LowGQX:999,0,8

  1. For the Strelka2's single normal germline mode, I got no results in this site.

  2. The somatic mode results was:

1 241680511 . T TA . PASS SOMATIC;QSI=99;TQSI=1;NT=ref;QSI_NT=76;TQSI_NT=2;SGT=ref->het;MQ=60.00;MQ0=0;RU=A;RC=3;IC=4;IHP=3;SomaticEVS=9.50;EVSF=76,0.070186,0.80947,4,3,3,1,1.8525,2.7668,-4.1217,0.052632,0.83721,0.48649,0.13131,0.10811,0.010989,0.070186,-0.49174,0.49174 DP:DP2:TAR:TIR:TOR:DP50:FDP50:SUBDP50:BCN50 37:37:18,21:1,2:18,16:37.15:4.78:0.00:0.11 98:98:14,14:72,75:13,11:91.30:1.07:0.00:0.01

  1. Mutect2 somatic calling returned no results in this site.

  2. HaplotypeCaller showed variant allele frequency in normal was 0.4324324, which was a germline variant.

1 241680511 . T TA 418.64 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.828;DP=42;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.000;QD=11.31;ReadPosRankSum=-0.675;SOR=0.610 GT:AD:DP:GQ:PL 0/1:21,16:37:99:426,0,612