JesperAxelsson / rust-intmap

Specialized hashmap for u64 keys
MIT License
29 stars 10 forks source link

How was the prime number chosen? #45

Closed inorick closed 1 year ago

inorick commented 1 year ago

This is more of a question than an issue.

I was curious how the prime 11400714819323198549 was chosen. The Readme states that it is "the largest prime for u64". However, the largest prime below u64::MAX is 18446744073709551557 which is larger.

In any case, it would be interesting how the choice of the prime affects performance.

sigaloid commented 1 year ago

It's actually the closest prime number to the golden ratio's conjugate ($\frac{1}{\phi}$) when divided by u64::MAX, it's an optimization called "Fibonacci hashing". It isn't used a ton in real world implementations despite being more performant. I remember reading about it in Knuth's TAOCP.

The magic number here divided by u64::MAX: $\frac{11400714819323198549}{18446744073709551615} = \frac{1}{\phi} = 0.6180339887498948516559505626598896683193831281448476224881309663$

Basically, phi may (or may not) be the most irrational number (and if it's not, it's still a pretty good choice). Hence, the distribution with it is very high quality. When wrapping_mult on it, the distribution is highly random.

Here are some links: https://stackoverflow.com/a/33871291 https://book.huihoo.com/data-structures-and-algorithms-with-object-oriented-design-patterns-in-c++/html/page214.html https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/ https://www.cs.helsinki.fi/linux/linux-kernel/2002-00/1122.html

inorick commented 1 year ago

Thank you for the instructive links. It makes sense now.