alexpreynolds / kmer-counter

Count kmers with a more efficient (faster) hash table
MIT License
24 stars 5 forks source link

Canonical kmers? #7

Closed mmcguffi closed 4 years ago

mmcguffi commented 4 years ago

Would it be possible for you to add a "canonical kmer" option? eg https://bioinfologics.github.io/post/2018/09/17/k-mer-counting-part-i-introduction/ (Section: Reverse complement and canonical k-mers)

It seems fairly trivial (famous last words) -- I would fork this and submit a pull request myself, but I am not versed in C++

And I see that you have several repositories that are kmer counters that /do/ have this functionality -- did you find this hash table implementation to be the fastest in the end?

alexpreynolds commented 4 years ago

I made canonical output the default in commit bfe3ebb, using the definition of canonical provided in your linked post.

If you do make clean and git pull you should get updated code.

I haven't looked at the other repos in a long while. This is probably the repo I will maintain.

A while back, I did comparisons of various hash tables with these various repos and reported what I found in a (deleted) comment on Bioinformatics SE. If I remember correctly, the one I use here seems to offer the most balanced overall performance in time and space complexity for this task, of what I reviewed.

I have an updated FASTA parser in kmer-boolean that I might work in here, that would help with very large inputs. Let me know if that would be of use.