Closed rsharris closed 5 years ago
hi Bob, thanks much for spotting this. For perfectly circular unitigs, bcalm outputs them with the first kmer equal to the last kmer. This is indeed a bug. I'll work on fixing it.
I should also have written that this isn't causing me any grief.
Also, in retrospect, it's unclear from my report whether the problem occurs in step 1 or 2 (or both).
Replacing step 2 with an equivalent using jellyfish reveals that the problem did occur during step 1.
the bug should be fixed now
I'm seeing a few duplicate kmers in my output, which would seem to be unintended according to this statement in the github README:
My bcalm reports version as "BCALM 2, git commit 8137cc2, gatb-core version 1.4.1". I built from source fetched from --branch v2.2.1 a couple days ago.
The rate of duplicates is so low it makes me wonder if I have done something wrong. Exactly 100 duplicate 21-mers out of 140 million.
I had no luck creating a small example. But this is reproducible starting with SRR957915.sra from the short read archive. Here's what I did:
(1) Get abundance 2 21-mers and strip the non-names from the fasta headers.
(2) I was expecting the results of step 1 to contain no duplicated 21-mers. So asking bcalm for abundance 2 21-mers from that result should produce an empty file. But instead I end up with 100 length 21 unitigs.