Open pmenzel opened 3 years ago
Seeing the same issue with the 6p deletion GAGTTCA22028G:EFR156G
in Delta:
When setting -q 10
, the P-value is all good:
samtools mpileup -A -d 0 --reference nCoV-2019.reference.fasta -B -Q 0 mapped.primertrimmed.sorted.bam | \
grep -w 22028 | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 10 -t 0.25 && column -t -s$'\t' variants.tsv
[mpileup] 1 samples in 1 input files
[mpileup] Max depth set to maximum value (2147483647)
A GFF file containing the open reading frames (ORFs) has not been provided. Amino acid translation will not be done.
REGION POS REF ALT REF_DP REF_RV REF_QUAL ALT_DP ALT_RV ALT_QUAL ALT_FREQ TOTAL_DP PVAL PASS GFF_FEATURE REF_CODON REF_AA ALT_CODON ALT_AA
MN908947.3 22028 G -AGTTCA 25 25 34 25 0 10 1 25 0.000322779 TRUE NA NA NA NA NA
When setting -q 0
, the P-value is set to 1
:
samtools mpileup -A -d 0 --reference nCoV-2019.reference.fasta -B -Q 0 mapped.primertrimmed.sorted.bam | \
grep -w 22028 | \
ivar variants -r nCoV-2019.reference.fasta -m 20 -p variants -q 0 -t 0.25 && column -t -s$'\t' variants.tsv
[mpileup] 1 samples in 1 input files
[mpileup] Max depth set to maximum value (2147483647)
A GFF file containing the open reading frames (ORFs) has not been provided. Amino acid translation will not be done.
REGION POS REF ALT REF_DP REF_RV REF_QUAL ALT_DP ALT_RV ALT_QUAL ALT_FREQ TOTAL_DP PVAL PASS GFF_FEATURE REF_CODON REF_AA ALT_CODON ALT_AA
MN908947.3 22028 G -AGTTCA 25 25 34 25 0 0 1 25 1 FALSE NA NA NA NA NA
Dear @gkarthik ,
I tried to change the option
-q
inivar variants
to values below the default 20 and discovered a problem with calling deletions once the q-score threshold is set to a value below 10. My data is SARS-CoV2 and the deletion in question is the del69/70 in spike at position 21764.With this command I tried all values for
-q
from 0 to 20:The output for position 21764 is:
For
-q 10
and above, the variant is found with P-value = 0, and for-q 9
and below, it is not found with P-value = 1.The main difference between the files is that the column ALT_QUAL takes on exactly the value set for
-q
, probably because of https://github.com/andersen-lab/ivar/blob/ce1490a70806ea33b225e9cb1b89f15cf0fa0271/src/allele_functions.cpp#L90This value is then used here https://github.com/andersen-lab/ivar/blob/ce1490a70806ea33b225e9cb1b89f15cf0fa0271/src/call_variants.cpp#L138 for calculating the P-value, and it seems that it somehow misbehaves at values below 10?