QuentinPerez / 42-toolkit

:seedling: Useful structs written in C
GNU General Public License v3.0
108 stars 26 forks source link

htable key generator is broken for very simple inputs #37

Closed julien-lebot closed 10 years ago

julien-lebot commented 10 years ago

Choice of prime number seems to be very important. If too small hash collisions are numerous. Even the 1000th prime number does not seem to be enough to prevent simple test input to collide.

printf("%u\n", f_htable_generate_key(2, "AA"));
printf("%u\n", f_htable_generate_key(2, "AC"));

printf("%u\n", f_htable_generate_key(113, "AA"));
printf("%u\n", f_htable_generate_key(113, "Fb"));

printf("%u\n", f_htable_generate_key(127, "AA"));
printf("%u\n", f_htable_generate_key(127, "Fp"));

printf("%u\n", f_htable_generate_key(7919, "AA"));
printf("%u\n", f_htable_generate_key(7919, "@Q"));

printf("%u\n", f_htable_generate_key(7919, "zzzzzzzz@test.com"));
printf("%u\n", f_htable_generate_key(7919, "aaaaaaaa@test.com"));

printf("zzzzzzzzzzzzzzzz@test.com %u\n", f_htable_generate_key(7919, "zzzzzzzzzzzzzzzz@test.com"));
printf("aaaaaaaaaaaaaaaa@test.com %u\n", f_htable_generate_key(7919, "aaaaaaaaaaaaaaaa@test.com"));

I suggest as a temporary fix (as it's not a real solution) to change the bitwise shift up to:

ret = (ret << 1) + str[i];

This will saturate the hash slower and will generate more entropy with the addition. It's not a solution because it will still eventually saturate.

QuentinPerez commented 10 years ago

Now you can use your own hash function. Are you right with this ?