dnrajugade / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Hasher wrapper that appends or prepends the size of the data chunk to avoid hashing collisions #1202

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The Javadoc to `Hasher` recommends to use a delimiter in order to avoid hash 
collisions. IMHO this is no good idea.

In might work when you select some exotic delimiter, but it might fail one day 
(with strings it's more plausible than with byte[], but there's no universal 
delimiter either).

There are cases when the user wants to depend on the uniqueness of the hash, 
which allows e.g. to implement an efficient `equals` for trees-like structures. 
It works well for a sufficiently good hash function (e.g. git uses SHA1).

However, in order to be sure it works, a delimiter doesn't suffice. The Javadoc 
should IMHO recommend the following instead:

newHasher().putInt(s1.length()).putString(s1).putInt(s2.length()).putString(s2);

Note that the length must be prepended, not appended.

Original issue reported on code.google.com by kak@google.com on 13 Nov 2012 at 12:11

GoogleCodeExporter commented 9 years ago
I was wrong with my last sentence; appending works too (it creates a suffix 
code instead of a prefix code).

It might make sense to provide a hasher taking care of this. Simply calling

Hashing.protectedHasher(Hashing.sha1().newHasher()).putString(s1).putString(s2);

would guarantee a unique `HashCode` over all strings `s1` and `s2` until 
somebody breaks SHA1.

It can't protect against everything, it just handles the most common case 
(strings and byte arrays).

I'd say it really should accept nulls. Otherwise the user might lose uniqueness 
too easily by writing

if (s1!=null) hasher.putString(s1);
if (s2!=null) hasher.putString(s2);

Original comment by Maaarti...@gmail.com on 13 Nov 2012 at 9:44

GoogleCodeExporter commented 9 years ago
Thanks for the suggestions Martin!  I've updated the docs internally (should be 
mirrored out soon).

I'm also going to re-file this as a feature request for a "protected" (not sure 
if that's the right name?) Hasher wrapper.

Original comment by kak@google.com on 13 Nov 2012 at 5:11

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:13

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:18

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08