FakerPHP / Faker

Faker is a PHP library that generates fake data for you
https://fakerphp.github.io
Other
3.47k stars 330 forks source link

Fix issue #759 : The output of the hash generators is misleading #880

Open kcassam opened 2 months ago

kcassam commented 2 months ago

md5(), sha1() and sha256() generators is misleading

What is the reason for this PR?

Author's checklist

Summary of changes

In order to leverage the full output space of the respective hashes, I use the the function random_bytes($length) In order to keep those functions seedable, I add a $seed parameter that can be used to pass a parameter so that the function will always return the same hash given the same seed.

When this one is merged, I can take car of https://github.com/FakerPHP/Faker/issues/761

Review checklist

TimWolla commented 2 months ago

This change is incorrect, because it prevents seeding when using these functions, whereas the previous implementation was seedable.

kcassam commented 2 months ago

This change is incorrect, because it prevents seeding when using these functions, whereas the previous implementation was seedable.

@TimWolla I have corrected, seeding is OK now and I added some new test to garantee seedibility. However, the expected result will change after updating the library, so it can break some tests

I don't know if this is considered as BC and if this is OK. Otherwise, we can keep the tests with a change on expected values.

kcassam commented 2 months ago

Thanks for the reviews @TimWolla. I have another idea, I will submit it later on.

kcassam commented 2 months ago

@TimWolla the PR is ready for review. ✅ seedable (at the cost of a small BC) ✅ seedibility tested ✅ no bias ✅ ext/bcmath not used ✅ no more multiplyNumberBetweenNTimes clumsy/ugly function ✅ issue #759 solved

kcassam commented 2 months ago

@TimWolla Alternatively, to avoid any BC, we could keep the legacy methods without modifying them and create new ones with a slightly different name and with the new behavior, for example :

    public static function md5()
    {
        return md5(self::numberBetween());
    }

    public static function maxEntropyMd5($seed = null)
    {
        return md5($seed ?? random_bytes(128 / 8));
    }

We could use bin2hex instead of hash functions for performance reason. I let you decide if it is a better treadoff. bin2hex is about 5 times faster than sha256. https://onlinephp.io/c/c006b. We would have to complete/crop the seed to the desired length, but that is easy to do with substr and str_pad

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 1 week if no further activity occurs. Thank you for your contributions.

kcassam commented 2 months ago

@TimWolla hi, is it possible to review this PR ?