Fix FNV1a key poor distribution

mfenniak commented 9 months ago

There are two fixes included in this PR which together improve key distribution in the DefaultNodeLocator significantly.

Fix 1: The FNV hashes in FnvHash don't initialize themselves when the hash objects are constructed, causing the first hashes to be generated out of them to have an under-randomized value. It seems that these were developed with the assumption that either the Initialize method would be called by the builtin ComputeHash method (which it is after the first hash is computed, to prep for the next hash), or that the IUIntHashAlgorithm.ComputeHash method would be used which performs the init. But neither is true in the DefaultNodeLocator.

This change is supported by an additional test that verifies the FNV1a algorithm is matching published test vectors.

Fix 2: I've been noticing fairly uneven distribution of keys to multiple nodes in 3, 6, 9, and 12 node memcached clusters. I've tracked that down to the implementation of DefaultNodeLocator, which populates a consistent hash ring by using an FNV1a hash algorithm and taking the memcached node hostname and adding suffixed values like "-1", "-2", ... "-100" to create 100 hash positions in that ring.

The test DefaultNodeLocatorTest.TestLocator measures the variability from ideal with the implemented DefaultNodeLocator; eg. if you had 10 keys and 10 servers, ideally you'd get 1 key per server, variability 0%. The current implementation has a variability of 76% in this test; with 1,000,000 keys distributed over 8 servers, the luckiest server gets only 52,897 keys, and the unluckiest server gets 219,437 keys, where the ideal server would get 125,000 keys.

Expected about 125000 keys per server; got 89676 for server 0; variation: 28.3%
Expected about 125000 keys per server; got 140354 for server 1; variation: 12.3%
Expected about 125000 keys per server; got 198143 for server 2; variation: 58.5%
Expected about 125000 keys per server; got 52897 for server 3; variation: 57.7%
Expected about 125000 keys per server; got 97068 for server 4; variation: 22.3%
Expected about 125000 keys per server; got 53088 for server 5; variation: 57.5%
Expected about 125000 keys per server; got 219437 for server 6; variation: 75.5%
Expected about 125000 keys per server; got 149337 for server 7; variation: 19.5%

The root cause of this problem is that the FNV-1a hash algorithm is insensitive to changes at the end of the string:

One shortcoming of FNV-1a is that the hashes of strings differing only in the last character are quite similar in their left-most bits. For example, the 64-bit hash for "Give me an a" is e61148272046e5be, while the hash for "Give me an e" is e6114c272046ec8a2. Compare this to the hashes for "You are number 6" (ea412090e7d07541) and "You are number 2" (ea412490e7d07c0d) (https://gitlab.com/loximann/testing-hashers-for-shuffling-strings/-/blob/master/README.md#improving-the-avalanche-characteristics-of-the-fnv-1a-algorithm-for-optimal-string-shuffling)

And modifying the end of the string is exactly what the node locator does for populating the consistent hash ring.

To address this, I've made a simple change in principal: I've changed the node ring to be populated as "1-hostname", "2-hostname", etc.

Here's a comparison of different options I evaluated:

FNV1a: suffix'd number: max 76% from ideal distribution
FNV1a: prefix'd number: max 16% from ideal distribution
FNV1a: prefix'd & suffix'd number: max 22% from ideal distribution
Murmurhash: max 17% from ideal distribution
SHA256: max 11% from ideal distribution (as a "baseline", this is likely too expensive from a performance perspective)

This change seemed like the least risk, lowest code change with the best outcome.

There is one risk to communicate with this change -- when deployed in an existing application, the key->node distributions will be completely altered, resulting in a high initial rate of cache misses. The distribution fix is probably worth it, but that could be a risk that library users would want to manage.

cnblogs-dudu commented 9 months ago

EnyimMemcachedCore 2.8.0 has been released for this PR.

mfenniak commented 9 months ago

Thanks @cnblogs-dudu!

cnblogs-dudu commented 9 months ago

This is a break change. Released EnyimMemcachedCore 3.0.0 instead of 2.8.0.

cnblogs / EnyimMemcachedCore

Fix FNV1a key poor distribution #208