CSB5 / lofreq

LoFreq Star: Sensitive variant calling from sequencing data
http://csb5.github.io/lofreq/
Other
97 stars 30 forks source link

Lofreq viterbi #144

Open rahil19 opened 2 months ago

rahil19 commented 2 months ago
Hi, I recently used Lofreq v 2.1.5 to make variants calls to the HIV WGS data. Before running lofreq I aligned reads to BWA and filtered for properly paired alignment using samtools. After running lofreq commands in the following series: viterbi --> indelqual --> alnqual --> call, I noticed made some frameshift mutation calls. Upon looking at the alignment on IGV at one of the frameshift mutation region, the top one shows before the lofreq preprocessing and the bottom one after lofreq preprocessing. As you can see the viterbi step introduces insertion and deletion on the same reads resulting in 2 frameshift insertions and deletions reported on 29% of the reads as shown below: Sample HGVS.g HGVS.c HGVS.p lofreq Variant_Type lofreq_Var_Count
A NC_001802.1:g.5212_5213insCC HIV1gp4:c.108_109insCC vpr:p.Ile37fs 0.290914 frameshift_variant 3269
A NC_001802.1:g.5214_5215delTT HIV1gp4:c.111_112delTT vpr:p.Ile37fs 0.295586 frameshift_variant 3268

Because insertion and deletion are present on the same read it looks more like an artifact than real. How do I fix this? Should I be removing the viterbi step? If so, do I still keep the indelqual and alnqual steps?

I've also found regions where the alignments were completely missing due to BWA (2nd figure attached) and I was wondering if I provide lofreq with raw alignment bam file (containing unmapped reads) instead of filtered proper paired alignment bam, viterbi step can possibly realign the unmapped reads in those regions with gaps?

Lofreq_viterbi_realignment_issue No_read_mapping_region