AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
128 stars 57 forks source link

Variant with quality only gets called when decreasing the minimum phred score #357

Closed rollf closed 2 years ago

rollf commented 2 years ago

Hi,

I'm having an issue calling a specific variant with VarDict 1.8.2. Please find the attached rs2819561-not-called.tar.gz archive that can be used to reproduce the issue (given a path to your GRCh37.fa as well as the VarDict command available). The sample bam file contains the variant rs2819561 (NC_000003.11:g.4403767A>G in GRCh37.p13).

image

It does not get called by default. The read quality is good, the mapping quality at the given base is good, too. For me, there is no obvious reason why this should not get called.

cat run.sh
...
VarDict -G "$REF_GENOME" -N rs2819561 -b $BAM $BED -c 1 -S 2 -E 3 -g 4 --header "$@" | cut -f 1,3-7,19-23
...

./run.sh
Sample  Chr     Start   End     Ref     Alt     QMean   QStd    MQ      Sig_Noise       HiAF # no variant gets called

Decreasing the minimum phred score base quality to 14 yields the variant

./run.sh -q 14
Sample  Chr     Start   End     Ref     Alt     QMean   QStd    MQ      Sig_Noise       HiAF
rs2819561       3       4403767 4403767 A       G       26.4    1       59.8    364.000 0.5215 # <-- that's the one I'm waiting for
rs2819561       3       4403767 4403767 A       C       14.0    0       56.0    10.000  0.0143
rs2819561       3       4403767 4403767 A       T       14.0    0       59.4    10.000  0.0143
rs2819561       3       4403766 4403766 G       C       14.0    0       60.0    10.000  0.0143

Unfortunately, it also calls 3 other (false positive) variants. Also, the first variant gets QMean = 26.4 so I would have assumed it should get called by default (given -q = 22.5 by default). Do you see any way for me to call this variant but not the others?

Thanks in advance!

rollf commented 2 years ago

Hi,

would you have time to have a look into this?

Thanks.

calocean commented 2 years ago

I have the same problem. Have you solved it?

PolinaBevad commented 2 years ago

Hi Rolf, This variant should be filtered out because of low SN (proportion of high quality reads to low quality reads). It seems that there a lot of reads that support this variant has low base quality on this position, at least when you check in IGV, almost half of reads has phred quality ~14 there. You can try to decrease a little SN option threshold (-o, default is 1.5) instead of -q option, this can help against FP.

rollf commented 2 years ago

Thanks @PolinaBevad for your reply including the hints. Your are right, running with -o 1.3 (in my particular case) yields the variant without the FP ones.