JIABI / xxhash

Automatically exported from code.google.com/p/xxhash
Other
0 stars 0 forks source link

Collision? (I must be doing something wrong here...) #14

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. See collision_course.cpp
2. Compiled like this: gcc xxhash.c collision_course.cpp -o collision_course
3. Running it like ./collision_course produces output c91c0c6d vs. c91c0c6d

What is the expected output? What do you see instead?
I expect to see two different hashes, but instead they are the same.

What version of the product are you using? On what operating system?
The implementations used are attached, they have been retrieved from trunk on 
13 SEP 2013.
This is running on Mac OS X.

Please provide any additional information below.
Maybe I'm using the hash in a way that it's not intended, can't really tell 
from documentation/comments in source code.

Original issue reported on code.google.com by mooroobo...@gmail.com on 19 Sep 2013 at 9:51

Attachments:

GoogleCodeExporter commented 8 years ago
Hello

The current version of xxHash is 32-bits only. That means it produce a value 
between 0 and 4 billions (approximately).

There is never a guarantee that 2 different inputs produce different hashes. 
It's just statistically the case, but not guaranteed.

The probability that 2 different entries produce identical hashes is low (1 
chance in 4 billions), but is not zero. It might seem very low, but when 
generating a lot of hashes, it becomes perceptible. For example, if you 
generate ~60 thousands 32-bits hashes, you have almost 50% probability that 2 
of them will be equal. This is mathematical.

You never have a guarantee that 2 different entries produce different hashes. 
Since is impossible, due to pigeon hole principle.
However, it's possible to make this probability so small, that it becomes 
almost equivalent to "impossible".

Such a "small enough probability" property is typically achieved when using 
128-bits hashes, or even 64-bits for a small number of hashes.

But for 32-bits hashes, the rule of thumb is that you can't consider hashes to 
be unique. Even at 1/4Billions, it's still too high.

Note : It's possible to produce higher resolution hashes from xxHash, such as 
64-bits & 128 bits one, but there is no published code for it yet.

Original comment by yann.col...@gmail.com on 19 Sep 2013 at 10:05

GoogleCodeExporter commented 8 years ago
Hi,

Thanks so much for the quick response and clear explanation.
Really appreciated!

Original comment by mooroobo...@gmail.com on 19 Sep 2013 at 1:02

GoogleCodeExporter commented 8 years ago

Original comment by yann.col...@gmail.com on 19 Sep 2013 at 1:46