andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
115 stars 40 forks source link

Issue with calling deletion in SARS-CoV-2 #86

Open kea027 opened 3 years ago

kea027 commented 3 years ago

Hi, our group is using iVar in our ARTIC SARS-CoV-2 workflow. When calling variants, it is listing the deletion at 28,254 with stats that don't match the frequency and POS as -1. See below. What could be causing this issue?

REGION | POS | REF | ALT | REF_DP | REF_RV | REF_QUAL | ALT_DP | ALT_RV | ALT_QUAL | ALT_FREQ | TOTAL_DP | PVAL | PASS

MN908947.3 | 28253 | C | -A | 15711 | 7911 | 40 | 15758 | 0 | 20 | 0.989451 | 15926 | 0 | TRUE

gkarthik commented 3 years ago

Hello, ivar reports deletions as occurring at the previous position since the next position has been deleted. This is more of a semantics thing. When calculating frequency stats for deletions, we use the depth from the previous position so occasionally this might throw things off.

a-diamant commented 3 years ago

Hello! we run into the same issue. Do you have any suggestions how we could count the frequecy of deletions? For instance here is an output from ivar:

POS | REF | ALT | REF_DP | REF_RV | REF_QUAL | ALT_DP | ALT_RV | ALT_QUAL | ALT_FREQ | TOTAL_DP | 21991 | t | -TA | 78 | 40 | 28 | 18 | 0 | 20 | 0.0390456 | 461 |

ivar reports alteration frequency of 0.04 but: total_dp - ref_dp = 461 - 78 = 383 thus i would expect alt frequency to be 383/461=0.83

here is the same sample and the same line from the raw ACTG count file that we have from Artic pipeline: sequence pos ref depth A C G T others MN908947.3 21991 T 439 4 8 7 122 -TTA:298

the frequency of alt here is 298/439=0.67

This number looks more like the picture i see in IGV viewer.

From our experience ivar constantly underestimates the frequencies for deletions.

here is another illustration where I plotted the frequencies of alterations described for B.1.1.7. The situation i described repeats for all of our samples (the yellow color marks frequencies less then 0.2 correspons to pos. 21765 and 21991 where the deletion happen): heatmap

kea027 commented 3 years ago

Hi Anna, In my lab, we've been using iVar in combination with another variant caller, LoFreq, to circumvent this issue. In addition to the frequency issue with indels, we've noticed that indel calls by iVar all have a Q20 score listed, has this been your experience?

a-diamant commented 3 years ago

Thank you for your suggestion! we didn't use the quality scores by ivar but there seem to be an issue that someone has reported.

cutpatel commented 3 years ago

Hello, ivar reports deletions as occurring at the previous position since the next position has been deleted. This is more of a semantics thing. When calculating frequency stats for deletions, we use the depth from the previous position so occasionally this might throw things off.

Why for deletions frequency of the previous position is used? As the position is present in the reference you can just count the frequency as usual, or not? Reads either shhow it or not. The only problem you have is insertions, as insertions will have no reference position you can count and therfore always end up ~100%. Here in our variantt calling we take frequencies of previoous and following position to get an artificial freq for the insertion. How is ivar handling insertions?