TimeAndSpaceIO / SmoothieMap

A gulp of low latency Java
295 stars 30 forks source link

Keys collision by 30 lowest bits limitation #4

Closed frajt closed 6 years ago

frajt commented 8 years ago

Hi,

I am just wondering whether or not the 30 lowest bit keys collision limitation is no-go for a real production usage? I cannot imagine that we deploy code which can anytime run into an illegal state exception without us having any chance to fix it. Whatever 'salt' we can add to key hash codes we still cannot guarantee that keys won't collide at 30 lowest bits. I understand that the probability is very low as the collision is limited to a single segment only but the Smoothie Map code has very explicit termination by throwing the illegal state exception.

Isn't there an option to somehow re-shake the data to get rid of the segment collision even with the put operation cost increase to O(N)?

Regards, Michal

leventov commented 7 years ago

@frajt, the probability is about 10-19 * number of keys in your map. Given that in "real production usage" applications crash routinely bacause of bugs, OutOfMemoryErrors, etc., I would not consider this collision problem a no-go.

To ensure even key hash code distribution, you should override SmoothieMap.keyHashCode() and correspondingly keysEqual(). E. g. if your keys are Strings, you can use LongHashFunction.xx_r39(someRuntimeGeneratedRandomSeed).hashChars(key) instead of the infamous default String's hashCode().

In Smoothie Map 1.3, there is no workaround for this, if 32 keys do happen to collide. It is possible to change the source code so that it would not crash in this case, and continue to work with linear key access complexity, it's just not implemented.