Balzanka / guava-libraries

Automatically exported from code.google.com/p/guava-libraries
Apache License 2.0
0 stars 0 forks source link

Hashing.crc32() provides incorrect result #1332

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Java 1.7 Update 15, Guava 14.0

Example: Hashing.crc32().hashString("1234567890").toString() gives e7033050, 
while the expected result, if I'm not wrong, is 261daee5.

Not a character encoding problem as this also happens on byte data and, for 
example, MD5 hasher works properly.

Original issue reported on code.google.com by izstas@live.ru on 13 Mar 2013 at 6:15

GoogleCodeExporter commented 9 years ago
So I think what you're looking for is:

Hashing.crc32().hashString("1234567890", Charsets.UTF_8).toString();

From the docs of HashCode#toString:
"Returns a string containing each byte of asBytes(), in order, as a two-digit 
unsigned hexadecimal number in lower case. Note that if the output is 
considered to be a single hexadecimal number, this hash code's bytes are the 
big-endian representation of that number. This may be surprising since 
everything else in the hashing API uniformly treats multibyte values as 
little-endian. But this format conveniently matches that of utilities such as 
the UNIX md5sum command."

So, this holds true, which is basically just the reverse endianness of your 
expected output, right?
assertEquals("e5ae1d26", Hashing.crc32().hashString("1234567890", 
Charsets.UTF_8).toString());

Original comment by kurt.kluever on 13 Mar 2013 at 6:25

GoogleCodeExporter commented 9 years ago
A few more sanity tests:
     CRC32 crc32 = new CRC32();
     crc32.update("1234567890".getBytes(Charsets.UTF_8));
     HashCode hashCode = Hashing.crc32().hashString("1234567890", Charsets.UTF_8);
     assertEquals(crc32.getValue(), (long) hashCode.asInt());
     assertEquals((int) crc32.getValue(), hashCode.asInt());

Original comment by kurt.kluever on 13 Mar 2013 at 6:43

GoogleCodeExporter commented 9 years ago
Oh well, my example is really encoding problem, huh. Sorry for confusion.

Still, this thing is confusing me. When I use Hashing.md5(), the string is same 
as the one from md5sum, Windows' HashTab program, etc. When I use 
Hashing.crc32(), I get, as you said, reverse endianness. Not sure if there's 
any standard on how CRC32 should be represented, but, for example, PHP's crc32 
function provides what I expect...

Original comment by izstas@live.ru on 13 Mar 2013 at 6:47

GoogleCodeExporter commented 9 years ago
Hmm, I'm not familiar with the specifics of PHP's crc32 function, but in any 
case, we had to make a decision on what endianness to print stuff out in for 
the toString() representation, and (as the docs state) we ended up choosing big 
endian to mimic the output of "md5sum".

You might be better off relying on the asBytes() method or asInt() method 
instead?

Original comment by kurt.kluever on 13 Mar 2013 at 7:04

GoogleCodeExporter commented 9 years ago
This issue has been migrated to GitHub.

It can be found at https://github.com/google/guava/issues/<issue id>

Original comment by cgdecker@google.com on 1 Nov 2014 at 4:12

GoogleCodeExporter commented 9 years ago

Original comment by cgdecker@google.com on 3 Nov 2014 at 9:08