bacpop / unitig-caller

Methods to determine sequence element (unitig) presence/absence
Apache License 2.0
18 stars 3 forks source link

Missing sequences in unitig-caller #2

Closed apredeus closed 4 years ago

apredeus commented 4 years ago

Hello John,

I've noticed a very strange behaviour of unitig-caller while trying to debug some interesting results I found with pyseer. Below is a picture of significant unitigs that were either mapped using blastn (with low complexity filters off), or counted from the output of unitig-counter, or called with unitig-caller. The latter seems to be missing lots of calls. I have ~ 3100 genomes, and used "--simple" mode - perhaps this was the problem?

image

johnlees commented 4 years ago

Thanks for reporting this, and sorry that I haven't had a chance to look at it until now. This does look suspicious, there's a factor of two which makes me wonder whether the reverse complements are being missed.

Looking at the code history, this was only added in 36690514a81deb247342117e87ec436cc5843209, aka v1.0.1. I see that bioconda only has v1.0.0, which would explain this error. This is my fault for forgetting to update it! I will do that now. If you use the --index and --call mode it should work

apredeus commented 4 years ago

Ah, that makes sense, thank you! And thanks for fixing it.

One more thing possibly worth looking at - unitig-counter seems to get few more isolates than blast does. The reason for it is "corner cases" when part of the unitig in question is in the very beginning or end of the contig. Especially makes a difference for longer unitigs with lower allele frequency.