Closed GoogleCodeExporter closed 8 years ago
Hello
The current version of xxHash is 32-bits only. That means it produce a value
between 0 and 4 billions (approximately).
There is never a guarantee that 2 different inputs produce different hashes.
It's just statistically the case, but not guaranteed.
The probability that 2 different entries produce identical hashes is low (1
chance in 4 billions), but is not zero. It might seem very low, but when
generating a lot of hashes, it becomes perceptible. For example, if you
generate ~60 thousands 32-bits hashes, you have almost 50% probability that 2
of them will be equal. This is mathematical.
You never have a guarantee that 2 different entries produce different hashes.
Since is impossible, due to pigeon hole principle.
However, it's possible to make this probability so small, that it becomes
almost equivalent to "impossible".
Such a "small enough probability" property is typically achieved when using
128-bits hashes, or even 64-bits for a small number of hashes.
But for 32-bits hashes, the rule of thumb is that you can't consider hashes to
be unique. Even at 1/4Billions, it's still too high.
Note : It's possible to produce higher resolution hashes from xxHash, such as
64-bits & 128 bits one, but there is no published code for it yet.
Original comment by yann.col...@gmail.com
on 19 Sep 2013 at 10:05
Hi,
Thanks so much for the quick response and clear explanation.
Really appreciated!
Original comment by mooroobo...@gmail.com
on 19 Sep 2013 at 1:02
Original comment by yann.col...@gmail.com
on 19 Sep 2013 at 1:46
Original issue reported on code.google.com by
mooroobo...@gmail.com
on 19 Sep 2013 at 9:51Attachments: