Closed capr closed 5 years ago
You can use the following. It is a little slow, but hash table is usually much slower than hashing.
uint64_t hash_64(uint64_t key)
{
key = ~key + (key << 21);
key = key ^ key >> 24;
key = (key + (key << 3)) + (key << 8);
key = key ^ key >> 14;
key = (key + (key << 2)) + (key << 4);
key = key ^ key >> 28;
key = key + (key << 31);
return key;
}
Hi @attractivechaos, thanks for answering and for the hash function. I still have trouble figuring this out. For instance you say that
hash table is usually much slower than hashing
Ok, so here's my results with the above hash_64
function:
key size: 4, inserts: 1000000, unique keys: 25%, mil/sec: 49.091
key size: 4, lookups: 1000000, unique keys: 25%, mil/sec: 307.542
key size: 8, inserts: 1000000, unique keys: 25%, mil/sec: 19.614
key size: 8, lookups: 1000000, unique keys: 25%, mil/sec: 39.554
8x slower than with 32bit keys, which is fine given that twice as many 32bit keys fit into the CPU caches than 64bit keys, the 32bit hash function is a noop, etc.
But then if I switch to simpler 64bit multiplicative hash I get the following:
key size: 4, inserts: 1000000, unique keys: 25%, mil/sec: 46.969
key size: 4, lookups: 1000000, unique keys: 25%, mil/sec: 308.119
key size: 8, inserts: 1000000, unique keys: 25%, mil/sec: 20.004
key size: 8, lookups: 1000000, unique keys: 25%, mil/sec: 109.173
So 1.5x speed just by changing the hash function.
Now if I switch khint_t
to int64_t
and use the identity hash function (so no collisions), I get this:
key size: 4, inserts: 1000000, unique keys: 25%, mil/sec: 49.293
key size: 4, lookups: 1000000, unique keys: 25%, mil/sec: 322.876
key size: 8, inserts: 1000000, unique keys: 25%, mil/sec: 34.743
key size: 8, lookups: 1000000, unique keys: 25%, mil/sec: 184.550
This is probably as good as it gets.
I am still curious about other people's experience with using different hash functions because:
1) the theory of why the results vary so wildly is above my head, although obviously less collisions is better than more collisions on any implementation. 2) you don't mention or seem to have tested khash using different hash functions in your blog posts.
Now it's quite possible that the slowness is dude to a bug in my code, but I think it would still help to know if other users of khash have had a similar experience or it's just me.
Microbenchmarks are tricky. Showing numbers means little without the source code.
You're right. I wasn't showing the code because it's not in C, it's actually a port of khash in Terra but I'm not sure that it matters much, unless it's a translation bug (Terra has the same memory model as C it being a LLVM frontend). Anyway here it is, FWIW: https://github.com/luapower/khash.
Hi!
I noticed that khash is very sensitive to the choice of hash function for >= 64bit keys (not so for 32bit keys). In particular
kh_int64_hash_func()
is extremely slow with random input. Does anyone have a similar experience? Is it expected to have wildly varying results based on the choice of hash function?Thanks!