Illumina / strelka

Strelka2 germline and somatic small variant caller
GNU General Public License v3.0
355 stars 102 forks source link

inconsistent reproducibility of variants #227

Open anoronh4 opened 1 year ago

anoronh4 commented 1 year ago

We are finding some reproducibility issues when running Strelka2 to find germline and somatic variants. One issue is the base counting for tier 1,2 variants. The fields AU:CU:GU:TU are often blank in one run but filled in in another run. Take for example the following variant: Run 1:

19  8615160 .   G   T   .   LowEVS  SOMATIC;QSS=1;TQSS=1;NT=ref;QSS_NT=1;TQSS_NT=1;SGT=GG->GG;DP=180;MQ=60;MQ0=0;ReadPosRankSum=-0.17;SNVSB=0;SomaticEVS=1.14;EVSF=1,1,0.024793,60,0,0,-0.16669,-1.2052,0,0,32,42,0,0   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    56:0:0:0:0,0:0,0:56,57:0,0  121:0:0:0:0:.:.:.

Run 2:

19  8615160 .   G   T   .   LowEVS  SOMATIC;QSS=1;TQSS=1;NT=ref;QSS_NT=1;TQSS_NT=1;SGT=GG->GG;DP=180;MQ=60;MQ0=0;ReadPosRankSum=-0.17;SNVSB=0;SomaticEVS=1.14;EVSF=1,1,0.024793,60,0,0,-0.16669,-1.2052,0,0,32,42,0,0   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    56:0:0:0:0,0:0,0:56,57:0,0  121:0:0:0:0,0:0,0:118,120:3,3

We also found that the second run has 13 less total variants than the second. We have noticed this kind of inconsistency in other samples. We want to know if this is expected and if there's any way to ensure reproducibility or better catch errors.