AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
127 stars 55 forks source link

Low VAF variant missing in a region with a set of mismatches on Amplicon data #365

Open gaosong0329 opened 2 years ago

gaosong0329 commented 2 years ago

Hi,

I recently found that a low VAF variant is not called in one of our sample using amplicon data. However, when this sample is repeated and has slightly high coverage, that variant can be called by VarDict. In the first instance, even -p cannot show that variant in pileup results. More details are as below:

The IGV screenshot of two samples are as below: based on bams, two samples have the same allele frequency (T) as 6%. However, only the sample 14 has the variant detected.

igv

When I check the pileup generated using "-p" parameter, I have following stats of the two samples: sample 14 has variant T with AltSupport 90. But Sample 4 does not have T detected at all.

stats

The region in the 8-col bed file is:

chr9    5069926 5070131 JAK2_ex12_529_547RH.7290FAABE7B14DAZ0Z  0   +   5069952 5070104

The parameter I use to run VarDict is:

vardict-java -G {REF} \\
        -N {wildcards.sample} -b {input.bam} \\
        -X 0 -I 350 --mfreq 0.1 --nosv --fisher \\
        -f 0.01 -r 5 \\
        -c 1 -S 2 -E 3 {input.bed} 

If I add "-m 12", Vardict can detect that variant in sample 4 then.

I understand that with -m 12, reads with more mismatches are allowed. Since some reads in this region contain quite a number of mismatches, those reads will be filtered out with default threshold. However, I would think even with the default value, that variant T should show in pileup with some reads which are relatively clean.

Any suggestions or hints are welcome! Thank you very much!

Cheers Song