AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
129 stars 55 forks source link

Candidate Somatic not filtered with -M? #247

Closed Amfgcp closed 5 years ago

Amfgcp commented 5 years ago

Hi, I'm using version 1.6.0 and I was hoping that running the var2vcf_paired.pl script with the -M flag (used to output only candidate somatic variants) would filter out variants marked with STATUS=Germline and that is not the case. Is this expected? For example I got:

chr1    974039  .   C   T   168 PASS    AF=0.5417;DP=96;LSEQ=GGGGGGCCGCAGCCCCCAGA;MSI=1.000;MSILEN=1;RSEQ=GCCCCTCAGCTTGTGAGTAG;SAMPLE=NIC3T;SHIFT3=0;SOR=1.73047;SSF=0.03247;STATUS=Germline;TYPE=SNV;VD=52 GT:AD:ADJAF:AF:ALD:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD    0/1:69,47:0.0086:0.4052:17,30:2,2:116:0.3964:60:1.5:1.4351:37.7:1:1:28.7:31,38:0.44281:14.667:47    1/0:44,52:0.0208:0.5417:17,35:2,2:96:0.5368:60:1.2:1.17447:46.5:1:1:29.6:16,28:0.82968:51:52

when I was expecting to have some kind of germline filter instead of PASS.

The command used was:

...
var2vcf_paired.pl \
-N "NIC3T|NIC3N" \
 \
-M \
-A \
-Q 20 \
-d 8 \
-v 4 \
-f 0.02 \
> NIC3T-NIC3N-scatter1.bed.vcf

and I see that the other filters are apparently working as intended:

chr1    974476  .   C   A   30  P0.05;Q20;SN1.5;f0.02;p8;pSTD;q22.5;v4  AF=0;DP=173;LSEQ=AGGAGGAGCCGGGACCCCGG;MSI=1.000;MSILEN=1;RSEQ=TACGACCACCTCTGGGACGA;SAMPLE=NIC3T;SHIFT3=0;SOR=0;SSF=0.26634;STATUS=StrongLOH;TYPE=SNV;VD=0   GT:AD:ADJAF:AF:ALD:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD    0/0:183,2:0:0.0108:1,1:2,2:185:0.011:60:2:1.13:52:1:0:30:86,97:1:4:2    0/0:172,0:0:0:0,0:0:173:0:0:0:0:0:0:0:0:82,90:1:0:0
chr1    974537  .   G   A   30  P0.05;f0.02;v4  AF=0.0124;DP=161;LSEQ=AGAAGTGCCCCCAGCTTGGA;MSI=3.000;MSILEN=1;RSEQ=GGCCTGAGGCCAGTGGGGGG;SAMPLE=NIC3T;SHIFT3=0;SOR=Inf;SSF=0.25394;STATUS=StrongSomatic;TYPE=SNV;VD=2    GT:AD:ADJAF:AF:ALD:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD    0/0:158,0:0:0:0,0:2,0:158:1:60:0.2:0:40.7:1:1:29.8:73,85:1:78:0 0/0:159,2:0:0.0124:1,1:2,2:161:0.0127:60:1:1.12:54:1:0:30:75,84:1:4:2
chr1    1071610 .   G   A   60  P0.05;d8    AF=1;DP=4;LSEQ=GGGATCCTCTTGTCCACCCC;MSI=1.000;MSILEN=1;RSEQ=TCAGGACCCAGCCTGGAGAA;SAMPLE=NIC3T;SHIFT3=0;SOR=0;SSF=1;STATUS=Germline;TYPE=SNV;VD=4    GT:AD:ADJAF:AF:ALD:BIAS:DP:HIAF:MQ:NM:ODDRATIO:PMEAN:PSTD:QSTD:QUAL:RD:SBF:SN:VD    1/1:0,4:0:1:4,0:0,0:4:1:60:1.5:0:45.2:1:0:30:0,0:1:8:4  1/1:0,4:0:1:4,0:0,0:4:1:60:1:0:45.8:1:0:30:0,0:1:8:4

Were my expectations wrong? Thanks

PolinaBevad commented 5 years ago

Hi @Amfgcp,

This option worked as a "hard" filter (means it left only somatic variants) about 4 years ago, but there was a decision to change its behavior to "soft" filter, as you can see. Now it will leave germline variants in output, and either will mark them as "PASS" if they pass all filters or will add specific FILTER if some filters aren't passed. I think in your case germline's pvalue less than default 0.05 significance threshold and no other filters (quality, frequency...) were triggered, so it is "PASS".

This change for -M option was made here: https://github.com/AstraZeneca-NGS/VarDict/commit/9a98d0a2ff9ca2c043273c5fb5c206dc5704e27e#diff-794ef71df5d793096f2b59c24540beafL186 You can also hard-filter germlines if needed, then you have to change these 2 lines with opt_M in your var2vcf_paired.pl to its previous implementation.

Amfgcp commented 5 years ago

Understood, thank you for the quick answer @PolinaBevad