FakerPHP / Faker

Faker is a PHP library that generates fake data for you
https://fakerphp.github.io
Other
3.48k stars 332 forks source link

The output of the md5(), sha1() and sha256() generators is misleading #759

Open TimWolla opened 9 months ago

TimWolla commented 9 months ago

Summary

The md5(), sha1() and sha256() generators are documented to return a random MD5, SHA-1 or SHA-256 hash respectively. This is technically true, but at the same time it's also misleading, because the functions are unable to leverage the full output space of the respective hashes the way they're written. They don't return a random hash, they return the hash of a random 32 bit integer which is something entirely different.

I'm skipping the remainder of the template, because the issue is evident by looking at the code:

https://github.com/FakerPHP/Faker/blob/57e1f991fbd4add75384b24c1511b91efdc9e1fc/src/Faker/Provider/Miscellaneous.php#L245-L273

Possible solution:

Replace the implementation by bin2hex(random_bytes($bytes)) with $bytes being 16, 20 and 32 to make use of the entire output space [1]. But even then it would be slightly misleading, because hexadecimal is just one possible encoding for an 128/160/256 bit integer. Returning raw bytes or base64 encoding would also be valid representations that are actually used in practice in the context of a cryptographic hash.

[1] With PHP 8.2 use Randomizer::getBytes().

kcassam commented 3 months ago

Seems to me that there is no need to use bin2hex(). random_bytes($bytes) returns a raw string of the desired lenght and does a better job.

TimWolla commented 3 months ago

random_bytes($bytes) returns a raw string of the desired lenght and does a better job.

The functions return hexadecimal characters, thus the output from random_bytes() needs to be hex encoded.

kcassam commented 3 months ago

The functions return hexadecimal characters, thus the output from random_bytes() needs to be hex encoded.

I don't see why. Miscellaneous::md5, Miscellaneous::sha1 and Miscellaneous::sha256 return hexadecimal characters independently of their parameter/input

TimWolla commented 3 months ago

The bin2hex would replace the call to the hash function. Actually hashing is not necessary, the output is indistinguishable anyway.