andersen-lab / ivar

iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.
https://andersen-lab.github.io/ivar/html/
GNU General Public License v3.0
115 stars 39 forks source link

ivar consensus: option -q is not filtering low quality bases #97

Open pmenzel opened 3 years ago

pmenzel commented 3 years ago

Consider an example output from samtools mpileup for the SARS-CoV2 mutation N501Y:

MN908947.3      23063   A       20      tttttttttttttttttttt    EEEEEAAAEEEAEEEEEE/A

i.e. there are 20 reads covering this position with a T, and one of the bases only has a quality score of 14 (encoded by /).

When setting a minimum coverage threshold of 20 (option -m 20) and minimum q-score of 20 (option -q 20) in ivar consensus, the program outputs a T in the consensus sequence, while it should be a N, since only 19 of the bases are at least quality 20.

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | ivar consensus -t 0.9 -m 20 -q 20 -n N -p test.fa && cat test.fa

Minimum Quality: 20
Threshold: 0.9
Minimum depth: 20
Regions with depth less than minimum depth covered by: N
Reference length: 1
Positions with 0 depth: 0
Positions with depth below 20: 0
>Consensus_test_threshold_0.9_quality_20
T

On the other hand, ivar variants properly filters the lower quality base and does not call the variant unless reducing the value for option -q.

ivar variants -q 20:

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 20 -t 0.25 && column -t -s$'\t' variants.tsv

REGION  POS  REF  ALT  REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL  PASS  GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA

ivar variants -q 14:

echo "MN908947.3\t23063\tN\t20\ttttttttttttttttttttt\tEEEEEAAAEEEAEEEEEE/A" | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 14 -t 0.25 && column -t -s$'\t' variants.tsv

REGION      POS    REF  ALT  REF_DP  REF_RV  REF_QUAL  ALT_DP  ALT_RV  ALT_QUAL  ALT_FREQ  TOTAL_DP  PVAL         PASS  GFF_FEATURE  REF_CODON  REF_AA  ALT_CODON  ALT_AA
MN908947.3  23063  A    T    0       0       0         20      20      33        1         20        1.45089e-11  TRUE  NA           NA         NA      NA         NA
gkarthik commented 3 years ago

Hello @pmenzel , thank you for pointing this out! The way it's setup right now, the variants take the quality into account while setting the threshold for depth but consensus does not take the quality into count when looking at minimum depth. I agree this is confusing and I will work on changing this behavior.