Closed endrebak closed 8 years ago
It is a little more complicated than that. In case of a counter overflow, more than one entry in the hash table will be used for a k-mer. So there is no real upper bound on the count associated with a given k-mer, regardless of counter-len (as long as counter-len is > 0). counter-len == 0 can be used internally to represent a set.
In practice, you should use counter-len large enough to accommodate the count of most of your k-mers. For example if 2^counter_len > 2 * coverage, few k-mers will 2 entries in the hash.
It will work whatever the setting used for counter-len, but you could pay a price in speed and memory consumption.
HTH.
That seems to fit with the testing I originally did (cl 1 took twice as long as default settings).
Btw. Is jellyfishes kmer counts exact? I compute the effective genome fraction with (number unique kmers in genome divided by genome length) and get somewhat different results than published (perhaps the published results from 2009/10 are inexact).
The counts are supposed to be exact. I do hope I don't have a bug that lasted all these years.
I have no reason to think (any potential) mistakes are on your side. Thanks for the software!
Sounds like it should not be able to since it can only separate 0 and 1 (I'd test it myself, but our server is down...).