biocore / deblur

Deblur is a greedy deconvolution algorithm based on known read error profiles.
BSD 3-Clause "New" or "Revised" License
92 stars 41 forks source link

Not sure where counts are going for dominant sequence #182

Closed ghost closed 6 years ago

ghost commented 6 years ago

I'm trying to learn more about deblur and have run into a snag. When I run a dataset through deblur and have it keep all of the temporary files (i.e. with --keep-tmp-files), the last three temporary fasta files that are generated end in msa, msa.deblur, and msa.deblur.no_chimeras. As I understand the algorithm and looking at the source code, the most abundant sequence should not change its abundance as it goes through the deblur algorithm. Yet, when I run it I'm losing 61 sequences. Ideas where this could be happening?

data.trim.derep.no_artifacts.msa
>M00967_48_000000000-A3T88_1_2111_26505_9431;size=23633

data.trim.derep.no_artifacts.msa.deblur
>M00967_48_000000000-A3T88_1_2111_26505_9431;size=23572

data.trim.derep.no_artifacts.msa.deblur.no_chimeras
>M00967_48_000000000-A3T88_1_2111_26505_9431;size=23572
ghost commented 6 years ago

Sorry, a few more minutes of reading through past issues and I think I figured out that it is related to #161. I was thinking it was as described in the paper and only considering the triangle of the matrix rather than the full matrix.