gmarcais / Jellyfish

A fast multi-threaded k-mer counter
Other
463 stars 136 forks source link

Feature/Option Ideas #27

Open aconz2 opened 9 years ago

aconz2 commented 9 years ago

I've come across a few use cases for Jellyfish that were not practical with the current executable, so I wrote the functionality as a seperate program, using Jellyfish as a library. I'd be happy to integrate them into the master if you feel they belong. They are as follows:

  1. Count k-mers found only in a .jf file. Currently this is an option, but if you want to do this for many different samples, you waste overhead of re-counting the input file. Iterating the binary dump is much quicker to prime the hash.
  2. Use the hash matrix from another .jf file. This is advantageous when you'd like the output to be in the same sorted order.
  3. Limit the count stored in the hash. Specifically, my use case has been to see which k-mers appear in a sample without caring about their counts. This would allow a single value bit to be used and prevent very large counts from using many entries. The "generalized" case is to disallow overflow of the counter, but the (easier) way I'm doing it now (specific to a single bit) is to first check if the k-mer value is 1, and add 1 if it is not.