Why hash random bytes? - Githubissues

Dissimilis commented 6 years ago

Why are you requiring to calculate hash of random bytes? What problem does it solve? https://github.com/SK-EID/smart-id-documentation#611-sending-authentication-request

jaanmurumets-sk commented 6 years ago

It is needed to have secure anti-forgery solution, later that hash should be verified: https://github.com/SK-EID/smart-id-documentation#613-verifying-the-authentication-response

We indeed should improve Smart-ID documentation should reasoning would be clear. Similar solution is used in many solutions, for example OpenID connect: https://developers.google.com/identity/protocols/OpenIDConnect

Create an anti-forgery state token You must protect the security of your users by preventing request forgery attacks. The first step is creating a unique session token that holds state between your app and the user's client. You later match this unique session token with the authentication response returned by the Google OAuth Login service to verify that the user is making the request and not a malicious attacker. These tokens are often referred to as cross-site request forgery (CSRF) tokens.

Dissimilis commented 6 years ago

I think you misunderstood my question. You provided example which generates cryptographically secure 64 bytes. And then, as a next step, you calculate SHA512 of those random bytes. Which is confusing because it's not clear what is the purpose of hashing random bytes.

Also the use of term "randomly generated hash" in your documentation ads more confusion. It's should be either "randomly generated bytes" or "hash of something". Please explain the logic behind "randomly generated hash".

martinpaljak commented 6 years ago

Maybe it helps a bit if the random generator is not-that-good-and-random-at-all ?

Dissimilis commented 6 years ago

@martinpaljak In the example it uses SecureRandom so your argument is not valid. And even if RNG function would be totally deterministic, hashing the output would hardly be helpful. And in this case I see it like this: You need 64 random bytes, you call SecureRandom which with the help of OS and hardware does it's best to generate unpredictable bytes. Then you feed the resulting bytes to some hash function possibly destroying all the entropy you got.

user8547 commented 1 week ago

@Dissimilis

A cryptographically secure hash function will reduce the entropy of the input only if the entropy of the input exceeds the output size of the hash function. E.g., hashing 32 bytes of random value using SHA-256 will not decrease the entropy provided by the 32-byte input value.

Indeed, if an RNG is totally broken and returns the same byte value, then hashing the output will not help, as the hash value will always be the same. However, in practice, broken RNGs tend to return unique values but with some pattern in it.

Consider such output returned by a broken RNG (this output is taken from a real-world broken RNG built into a smart card):

8D 5D 3A 2D E8 94 E1 85 9F 42 E8 E3 D9 7C B2 7E .]:-.....B...|.~
8D 5D 3A 2D E8 94 E1 85 9F 42 E8 E3 D9 7C B2 7E .]:-.....B...|.~
71 5D 3A 2D E8 94 E1 85 9F 42 E8 E3 D9 7C B2 7E q]:-.....B...|.~
71 B8 3A 2D E8 94 E1 85 9F 42 E8 E3 D9 7C B2 7E q.:-.....B...|.~
71 B8 AD 2D E8 94 E1 85 9F 42 E8 E3 D9 7C B2 7E q..-.....B...|.~
71 B8 AD CD 29 94 E1 85 9F 42 E8 E3 D9 7C B2 7E q...)....B...|.~
71 B8 AD CD 29 AD E1 85 9F 42 E8 E3 D9 7C B2 7E q...)....B...|.~
71 B8 AD CD 29 AD 8B 85 9F 42 E8 E3 D9 7C B2 7E q...)....B...|.~
71 B8 AD CD 29 AD 8B 6F F4 42 E8 E3 D9 7C B2 7E q...)..o.B...|.~
71 B8 AD CD 29 AD 8B 6F F4 DE E8 E3 D9 7C B2 7E q...)..o.....|.~
71 B8 AD CD 29 AD 8B 6F F4 DE 78 E3 D9 7C B2 7E q...)..o..x..|.~
71 B8 AD CD 29 AD 8B 6F F4 DE 78 EB 82 7C B2 7E q...)..o..x..|.~
71 B8 AD CD 29 AD 8B 6F F4 DE 78 EB 82 1D B2 7E q...)..o..x....~
71 B8 AD CD 29 AD 8B 6F F4 DE 78 EB 82 1D AE 7E q...)..o..x....~
71 B8 AD CD 29 AD 8B 6F F4 DE 78 EB 82 1D AE BC q...)..o..x.....
71 B8 AD CD 29 AD 8B 6F F4 DE 78 EB B7 B8 AD    q...)..o..x....

By observing this output, we can spot the pattern visually and predict next values of the RNG. However, if we hashed the corresponding output, the result would be indistingushable from a random bytestring (unless the entropy of the input is so low that it could be bruteforced by an attacker).

Consider another, a more extreme case, where a broken RNG returns memory contents of the device. By hashing RNG output, we can avoid sensitive memory content leakage to other parties.

To conclude, it is a good hardening measure to hash RNG output before using it in use cases, where this value gets disclosed to other parties.

SK-EID / smart-id-documentation

Why hash random bytes? #1