AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
129 stars 55 forks source link

Variants missing in amplicon samples #325

Open PabloCabaleiro opened 3 years ago

PabloCabaleiro commented 3 years ago

I have a problem with VarDict 1.8.2 calling amplicon based sequenced samples. I attach a sample with a SNP and an indel that are not being called in this version or VarDict 1.6.0. Nevertheless, the SNPs is called by Vardict 1.5.8. Also, I’ve run the sample with bcftools call and both variants are reported using the same files. The command and the output are the following:

samtools mpileup -A -B -h 400 -C10 -m 3 -F0.0002 -L 100000 -d 100000 -DSgu -f GRCh37.fa -l regions.bed example.bam | bcftools call -mv -

13  32913558    .   CAAAAAAA    CAAAAAA 64  .   INDEL;IDV=91;IMF=0.134417;DP=677;VDB=0;SGB=-0.693145;MQSB=1;MQ0F=0;ICB=1;HOB=0.5;AC=1;AN=2;DP4=70,0,39,1;MQ=60  GT:PL:DP:SP 0/1:97,0,150:110:0
13  32929387    .   T   C   228 .   DP=1746;VDB=1;SGB=-0.693147;RPB=1;MQB=1;MQSB=1;BQB=1;MQ0F=0;AC=2;AN=2;DP4=0,1,609,532;MQ=60 GT:PL:DP:SP 1/1:255,255,0:1142:0

The command executed with VarDict:

VarDict_1.8.2 -q 10 -th 10 -N p -f 0.1 -G GRCh37.fa -b example.bam -c 1 -S 2 -E 3 -g 4 regions.bed --nosv -P 1 -F 0 | teststrandbias-1.8.2.R | var2vcf_valid-1.8.2.pl -d 100 -f 0.1 -E -A

I’ve also tried the VarDict’s Amplicon mode with same results:

VarDict_1.8.2 -q 10 -th 10 -N p -f 0.1 -G GRCh37.fa -b example.bam -c 1 -S 2 -E 3 -g 4 regions_amplicon.bed --nosv -P 1 -F 0 | teststrandbias-1.8.2.R | var2vcf_valid-1.8.2.pl -d 100 -f 0.1 -E -A

The SNP reported only with VarDict 1.5.8 was called like:

13  32929387    .   T   C   320 p8  SAMPLE=p;TYPE=SNV;DP=529;VD=525;AF=0.9924;BIAS=0:1;REFBIAS=0:2;VARBIAS=0:525;PMEAN=1;PSTD=0;QUAL=35.5;QSTD=1;SBF=1;ODDRATIO=0;MQ=60;SN=1050;HIAF=0.9924;ADJAF=0;SHIFT3=0;MSI=1;MSILEN=1;NM=1.5;HICNT=525;HICOV=529;LSEQ=AAACAACTCCAATCAAGCAG;RSEQ=AGCTGTAACTTTCACAAAGT;DUPRATE=0;SPLITREAD=0;SPANPAIR=0 GT:DP:VD:AD:AF:RD:ALD   1/1:529:525:2,525:0.9924:0,2:0,525

I attach the alignments and the regions of interest. Is it possible to be a configuration issue? I'd be very grateful for any suggestions!

example_data.zip

PolinaBevad commented 3 years ago

Hi Pablo,

I've looked at the data and it seems that the problem is in option -q 10. There is possibility of complex variants TAGC->CAGA/G also and VarDict changed the way to recognize complex variants since 1.5.8. If you will increase -q threshold, the variant will appear, as bases from complex variant have bad base quality. Even -q 11 will work.

Hope this helps!

PabloCabaleiro commented 3 years ago

Hi Polina,

Thank you very much for your fast answer. I will try to optimize the -q option then with a few more samples and I will comment you the results.

Regards