andreas-wilm / lofreq3

LoFreq Version 3
MIT License
27 stars 0 forks source link

Not calling insertions on NC_000912_Mpneumoniae #38

Open andreas-wilm opened 3 years ago

andreas-wilm commented 3 years ago

Deletions are working fine, e.g. NC_000912.1:782755 T>TAG

See /data/out/NC_000912_Mpneumoniae

Indel qual shows up as 0: lofreq call -f $REFFA -b NC_000912_Mpneumoniae_comb.lf3.bam -r NC_000912.1:782754-782755 --loglevel 3 -p -P

andreas-wilm commented 3 years ago

782755 T>TAG predicted by LF2.15 on BAM files processed by both versions:

$ zgrep 78275 *vcf.gz
NC_000912_Mpneumoniae_comb.lf215.lf215.vcf.gz:NC_000912.1       782755  .       T       TAG     589     PASS    DP=100;AF=0.200000;SB=3;DP4=41,40,8,12;INDEL;HRUN=1
NC_000912_Mpneumoniae_comb.lf3.lf215.vcf.gz:NC_000912.1 782755  .       T       TAG     589     PASS    DP=100;AF=0.200000;SB=3;DP4=41,40,8,12;INDEL;HRUN=1

No difference in BI values between the two BAM files:

diff -u <(samtools view  NC_000912_Mpneumoniae_comb.lf3.bam  NC_000912.1:782754-782755  | awk '$6 ~ /[DI]/' | grep -o 'BI:Z:[^[:space:]]*') <(samtools view  NC_000912_Mpneumoniae_comb.lf215.bam  NC_000912.1:782754-782755  | awk '$6 ~ /[DI]/' | grep -o 'BI:Z:[^[:space:]]*')

But the BI value there is actually zero i.e. "!":

samtools view  NC_000912_Mpneumoniae_comb.lf3.bam  NC_000912.1:782754-782755  | awk '$6 ~ /[DI]/' | grep -o 'BI:Z:[^[:space:]]*'

LF2.15 pileup doesn't show those values:

lofreq plpsummary -f $REFFA  NC_000912_Mpneumoniae_comb.lf3.bam -r NC_000912.1:782754-782755 --call-indels

+AG   IQ =     44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44 44
+AG   MQ =     60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60
+AG   AQ =     36 41 42 41 42 40 41 42 12 39 30 38 40 36 36 42 42 31 20 39

Checking which values LF2.15 reads from ai:Z, you can see that the zeros are placeholders for the actual insert and the insert quality is the one before. Why does this work for deletions then??

andreas-wilm commented 3 years ago

Looks like off by one error also for deletions. Unclear why this worked there. Needs full testing against LF215 on simulated data but see also https://github.com/andreas-wilm/lofreq3/issues/40