There are two fixes included in this PR which together improve key distribution in the DefaultNodeLocator significantly.
Fix 1: The FNV hashes in FnvHash don't initialize themselves when the hash objects are constructed, causing the first hashes to be generated out of them to have an under-randomized value. It seems that these were developed with the assumption that either the Initialize method would be called by the builtin ComputeHash method (which it is after the first hash is computed, to prep for the next hash), or that the IUIntHashAlgorithm.ComputeHash method would be used which performs the init. But neither is true in the DefaultNodeLocator.
This change is supported by an additional test that verifies the FNV1a algorithm is matching published test vectors.
Fix 2: I've been noticing fairly uneven distribution of keys to multiple nodes in 3, 6, 9, and 12 node memcached clusters. I've tracked that down to the implementation of DefaultNodeLocator, which populates a consistent hash ring by using an FNV1a hash algorithm and taking the memcached node hostname and adding suffixed values like "-1", "-2", ... "-100" to create 100 hash positions in that ring.
The test DefaultNodeLocatorTest.TestLocator measures the variability from ideal with the implemented DefaultNodeLocator; eg. if you had 10 keys and 10 servers, ideally you'd get 1 key per server, variability 0%. The current implementation has a variability of 76% in this test; with 1,000,000 keys distributed over 8 servers, the luckiest server gets only 52,897 keys, and the unluckiest server gets 219,437 keys, where the ideal server would get 125,000 keys.
Expected about 125000 keys per server; got 89676 for server 0; variation: 28.3%
Expected about 125000 keys per server; got 140354 for server 1; variation: 12.3%
Expected about 125000 keys per server; got 198143 for server 2; variation: 58.5%
Expected about 125000 keys per server; got 52897 for server 3; variation: 57.7%
Expected about 125000 keys per server; got 97068 for server 4; variation: 22.3%
Expected about 125000 keys per server; got 53088 for server 5; variation: 57.5%
Expected about 125000 keys per server; got 219437 for server 6; variation: 75.5%
Expected about 125000 keys per server; got 149337 for server 7; variation: 19.5%
The root cause of this problem is that the FNV-1a hash algorithm is insensitive to changes at the end of the string:
And modifying the end of the string is exactly what the node locator does for populating the consistent hash ring.
To address this, I've made a simple change in principal: I've changed the node ring to be populated as "1-hostname", "2-hostname", etc.
Here's a comparison of different options I evaluated:
FNV1a: suffix'd number: max 76% from ideal distribution
FNV1a: prefix'd number: max 16% from ideal distribution
FNV1a: prefix'd & suffix'd number: max 22% from ideal distribution
Murmurhash: max 17% from ideal distribution
SHA256: max 11% from ideal distribution (as a "baseline", this is likely too expensive from a performance perspective)
This change seemed like the least risk, lowest code change with the best outcome.
There is one risk to communicate with this change -- when deployed in an existing application, the key->node distributions will be completely altered, resulting in a high initial rate of cache misses. The distribution fix is probably worth it, but that could be a risk that library users would want to manage.
There are two fixes included in this PR which together improve key distribution in the
DefaultNodeLocator
significantly.Fix 1: The FNV hashes in FnvHash don't initialize themselves when the hash objects are constructed, causing the first hashes to be generated out of them to have an under-randomized value. It seems that these were developed with the assumption that either the
Initialize
method would be called by the builtinComputeHash
method (which it is after the first hash is computed, to prep for the next hash), or that theIUIntHashAlgorithm.ComputeHash
method would be used which performs the init. But neither is true in theDefaultNodeLocator
.This change is supported by an additional test that verifies the FNV1a algorithm is matching published test vectors.
Fix 2: I've been noticing fairly uneven distribution of keys to multiple nodes in 3, 6, 9, and 12 node memcached clusters. I've tracked that down to the implementation of
DefaultNodeLocator
, which populates a consistent hash ring by using an FNV1a hash algorithm and taking the memcached node hostname and adding suffixed values like "-1", "-2", ... "-100" to create 100 hash positions in that ring.The test
DefaultNodeLocatorTest.TestLocator
measures the variability from ideal with the implemented DefaultNodeLocator; eg. if you had 10 keys and 10 servers, ideally you'd get 1 key per server, variability 0%. The current implementation has a variability of 76% in this test; with 1,000,000 keys distributed over 8 servers, the luckiest server gets only 52,897 keys, and the unluckiest server gets 219,437 keys, where the ideal server would get 125,000 keys.The root cause of this problem is that the FNV-1a hash algorithm is insensitive to changes at the end of the string:
And modifying the end of the string is exactly what the node locator does for populating the consistent hash ring.
To address this, I've made a simple change in principal: I've changed the node ring to be populated as "1-hostname", "2-hostname", etc.
Here's a comparison of different options I evaluated:
This change seemed like the least risk, lowest code change with the best outcome.
There is one risk to communicate with this change -- when deployed in an existing application, the key->node distributions will be completely altered, resulting in a high initial rate of cache misses. The distribution fix is probably worth it, but that could be a risk that library users would want to manage.