Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
357 stars 103 forks source link

All somatic SNVs filtered for some reason (mostly as "LowEVS" or "IRC") #7

Closed malachig closed 7 years ago

malachig commented 7 years ago

We are recently trying to update to Strelka2 for the first time. We are getting what seems to be plausible indel results but for SNVs, all SNVs fail filtering steps (almost all say "LowEVS", "IRC", or both).

Any ideas on what might be going wrong here?

Here are some example records from variants that we called by both Mutect and Varscan:

chr3    196487542   .   T   A   .   IRC DP=161;MQ=60;MQ0=0;NT=ref;QSS=94;QSS_NT=94;ReadPosRankSum=-0.15;SGT=TT->AT;SNVSB=0;SOMATIC;SomaticEVS=42.38;TQSS=2;TQSS_NT=2    GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:42:0:0:0:0,0:0,0:0,0:42,42  ./.:118:2:0:0:29,29:0,0:0,0:87,90
chr5    88195147    .   G   A   .   IRC DP=79;MQ=60;MQ0=0;NT=ref;QSS=102;QSS_NT=102;ReadPosRankSum=-0.64;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=44.82;TQSS=1;TQSS_NT=1   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:42:0:0:0:0,0:0,0:42,42:0,0  ./.:37:0:0:0:13,13:0,0:24,24:0,0
chr5    103555912   .   T   C   .   IRC DP=116;MQ=60;MQ0=0;NT=ref;QSS=83;QSS_NT=83;ReadPosRankSum=0.32;SGT=TT->CT;SNVSB=0;SOMATIC;SomaticEVS=38.06;TQSS=1;TQSS_NT=1 GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:46:0:0:0:0,0:0,0:0,0:46,47  ./.:69:0:0:0:0,0:13,13:0,0:56,56
chr5    132543517   .   G   C   .   IRC DP=94;MQ=60;MQ0=0;NT=ref;QSS=78;QSS_NT=78;ReadPosRankSum=-0.48;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=41.82;TQSS=1;TQSS_NT=1 GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:33:0:0:0:0,0:0,0:33,33:0,0  ./.:61:0:0:0:0,0:13,13:48,48:0,0
chr9    83867656    .   G   C   .   IRC DP=69;MQ=45.9;MQ0=2;NT=ref;QSS=68;QSS_NT=68;ReadPosRankSum=-0.14;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=28.74;TQSS=1;TQSS_NT=1   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:22:0:0:0:0,0:0,0:22,22:0,0  ./.:45:0:0:0:0,0:9,11:36,36:0,0
chr19   10154788    .   G   A   .   IRC DP=273;MQ=60;MQ0=0;NT=ref;QSS=130;QSS_NT=130;ReadPosRankSum=1.09;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=39.73;TQSS=1;TQSS_NT=1   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:94:0:0:0:0,0:0,0:94,96:0,0  ./.:177:0:0:0:37,37:0,0:138,138:2,2
chr20   409866  .   T   A   .   IRC DP=263;MQ=59.94;MQ0=0;NT=ref;QSS=103;QSS_NT=103;ReadPosRankSum=-2.67;SGT=TT->AT;SNVSB=0;SOMATIC;SomaticEVS=36.96;TQSS=1;TQSS_NT=1   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:78:0:0:0:0,0:1,1:0,0:77,78  ./.:184:0:0:0:35,35:0,0:0,0:149,149
chr20   33667668    .   G   A   .   IRC DP=405;MQ=60;MQ0=0;NT=ref;QSS=159;QSS_NT=157;ReadPosRankSum=0.49;SGT=GG->AG;SNVSB=0;SOMATIC;SomaticEVS=36.88;TQSS=2;TQSS_NT=2   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:137:0:0:0:0,0:0,0:137,138:0,0   ./.:264:3:0:0:54,54:0,0:207,213:0,0
chrX    100408064   .   G   C   .   IRC DP=393;MQ=60;MQ0=0;NT=ref;QSS=158;QSS_NT=157;ReadPosRankSum=0.54;SGT=GG->CG;SNVSB=0;SOMATIC;SomaticEVS=37.41;TQSS=1;TQSS_NT=1   GT:DP:FDP:SDP:SUBDP:AU:CU:GU:TU ./.:103:0:0:0:0,0:0,0:103,103:0,0   ./.:288:2:0:0:0,0:71,74:215,216:0,0
ctsa commented 7 years ago

The IRC filter should not be coming from strelka directly, could this be added by some post-processing step? For SNVs, the LowEVS filter is the primary mechanism used to suggest a default passing variant threshold -- the is based on the output of the random forest "Empirical Variant Score" for the SNV provided in the INFO/SomaticEVS field.

malachig commented 7 years ago

Thanks @ctsa. Sorry, my bad. This is in fact being introduced by a post-processing step.

The issue seems to be that our post-processing code assumes the presence of a GT field. strelka doesn't produce this. It looks like we can get what we need from NT (normal genotype) and SGT (somatic genotype) though.