dib-lab / khmer

In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more
http://khmer.readthedocs.io/
Other
748 stars 294 forks source link

k value and hashing #160

Closed meznah closed 10 years ago

meznah commented 11 years ago

To the best of knowledge khmer support k value <= 32 with no hashing collision. I am wondering if the khmer has a mechanism to support k-value >32 and still does not hash different k-mer to the same value?

ctb commented 11 years ago

On Tue, Sep 10, 2013 at 08:01:09AM -0700, meznah wrote:

To the best of knowledge khmer support k value <= 32 with no hashing collision. I am wondering if the khmer has a mechanism to support k-value >32 and still does not hash different k-mer to the same value?

Hi Meznah,

see #27, also. Interestingly, while we have plans to support multiple hash functions including ones that allow k > 32, we don't have any plans to have exact hash functions for k > 32 at the moment. There are some technical problems with doing so (long long is 64 bits, or k=32) but we could think about it. Could you tell us what uses you have in mind?

thanks,

--t

C. Titus Brown, ctb@msu.edu

meznah commented 11 years ago

Some papers suggest that large k is the more sparseness or tagging approach will be useful. and I was just thinking in the future to explore the effect of k value in the sparseness, But not at the moment.

issue #27 is very related and open for me more things to be aware off. Is there any way to search the issues for key word to avoid opening unnecessarily issues?

ctb commented 11 years ago

On Tue, Sep 10, 2013 at 08:43:22AM -0700, meznah wrote:

Some papers suggest that large k is the more sparseness or tagging approach will be useful. and I was just thinking in the future to explore the effect of k value in the sparseness, But not at the moment.

Ahh, OK. In simulations you should be able to develop an idea of what moving k from 20 to 32 will do, for example, but I see your point that longer might be better.

issue #27 is very related and open for me more things to be aware off. Is there any way to search the issues for key word to avoid opening unnecessarily issues?

Yes, the search box at the top of the page only searches the project, not github as a whole; that's what I've been using. But duplicate issues are not a huge problem so don't worry about it too much!

--t

C. Titus Brown, ctb@msu.edu

mr-c commented 10 years ago

Does this issue need to be left open?

ctb commented 10 years ago

No, dup with #27.