apense / smhasher

Automatically exported from code.google.com/p/smhasher
0 stars 0 forks source link

Lots of collisions with triplets of integers #12

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I was interested in using a hash for integer arrays with few numbers in them.
So I searched for any collisions in a very sample case: unique triplets of 
numbers of the form (i,j,k) , where 1<= i< j < k <=255. The hash is calculated 
from the simple uint32_t array made with those 3 numbers. There are dozens of 
repetitions!

What steps will reproduce the problem?
-Run the enclosed C++ file
See the attached "sample output" for identified collisions.

Am I doing something wrong?

Original issue reported on code.google.com by par...@gmail.com on 10 Mar 2012 at 11:10

Attachments:

GoogleCodeExporter commented 9 years ago
Unfortunately the laws of probability are against you - according to the 
Birthday Paradox, you'll probably see collisions in the results of an N-bit 
hash after hashing 2^(N/2) keys - for a 32-bit hash, that's 65536 keys. Your 
test code is hashing ~2.7 million keys, so you are bound to see collisions.

See http://en.wikipedia.org/wiki/Birthday_attack for interesting details.

Original comment by tanj...@gmail.com on 11 May 2012 at 6:08

GoogleCodeExporter commented 9 years ago

Original comment by tanj...@gmail.com on 11 May 2012 at 6:08