FakerPHP / Faker

Faker is a PHP library that generates fake data for you
https://fakerphp.github.io
Other
3.47k stars 330 forks source link

`Generator::seed` in `Generator::__destruct` may break determinism #870

Open themasch opened 3 months ago

themasch commented 3 months ago

Summary

TL;DR:

We observed that, when re-creating new Faker\Generators between test cases, a "random" PHP garbage collector run happening on object allocation triggers Faker\Generator::__destruct which changes the mt_rand seed, breaking further fixture generation in a non-deterministic way.

Context

I am not sure if I would consider this an actual bug of Faker. It's more that the specific use case makes it break in some (rare) conditions. We do not use Faker itself directly, but through the symfony integration of Alice.

We use it to generate test data for phpunit test. Alice populates our test database, using faker to fill in fields that need data, but not predefined specific values. Works great for snapshot test and thelike.

The Problem

With one test, we ran into an issue where the test would sometimes, on some systems (mainly CI) fail due to Faker generating different data for that run. Not always the "same different" data, not always the exact same test case, and not on every run.

Setup

The way its setup looks like this:

For each test case, we "boot" a fresh Symfony stack, recreating all services. In the test configuration, Faker\Generator is a service in that stack, used by Alice to populate the test data.

So on each test, we create all these instances, including a Faker\Generator, which also gets a predefined seed, use Alice to read our fixture definitions and create objects from it, using Faker to populate properties, start a fresh db transaction, insert that data into our test db, run tests against that DB, and compare the result with snapshots. Then we rollback the whole transaction, teardown the symfony stack, and start the next test with a fresh stack, (that calls seed again) and a fresh clean DB.

probable root cause

After some debugging we found that sometimes Faker\Generator::__destruct is called while Alice creates the test data, on random object creations. I think that this is the PHP GC kicking in, claiming the Faker\Generator instance used in the previous test run.

This would also fit on the behaviour seen in #272

It also explains why we only see these problems in some cases, only on some hosts, and never when just running one single test alone.

Building a minimal test case I could reproduce that problem, and commenting out the seed() call in the __destruct also "fixed" the issue.

Conclusion

Our current workaround is to call gc_collect_cycles in the phpunit setUp method, before starting the new Symfony stack. This, as expected triggers Faker\Generator::__destruct and thereby seed before the new instance is created and sets the expected seed.

From our observation, that seems to have fixed our breaking tests.

But I am not sure if that really what we want to do here, and in all tests that use Alice/Faker. The reason these exact test cases failed, and all others using the same mechanism did not, I can only guess. Most likely it is just the amount of data. We usually try to limit test data to a minimum, so do not created tens or hundreds of entities with it. For the new one, we required more data, and therefor I figure we just pushed it "over the edge" where this test case would have a considerably higher chance of triggering a GC while building fixtures than every other test.

So for me, there are a few different ways this can be resolved:

I file the issue here, and not in on of the upstream projects, because I have to start somewhere and Faker seemed like the correct place.

Versions

Version
PHP >8.2.9 (also noticed with 8.2.16 and 8.3.2)
fakerphp/faker 1.23.1

Self-enclosed code snippet for reproduction

<?php

declare(strict_types=1);

require_once __DIR__ . '/vendor/autoload.php';

$pn1 = null;
$pn2 = null;

for ($i = 0; $i < 3; $i++) {
    $faker = Faker\Factory::create('en_US');
    $faker->seed(1234567);

    $n1 = $faker->numberBetween(1000, 10000);
    if ($pn1 !== null && $pn1 !== $n1) {
        echo sprintf('DELTA in N1: previous: %d, this time: %d, iteration: %d', $pn1, $n1, $i) . PHP_EOL;
    }
    $pn1 = $n1;

    gc_collect_cycles(); // force GC run

    $n2 = $faker->numberBetween(1000, 10000);
    if ($pn2 !== null && $pn2 !== $n2) {
        echo sprintf('DELTA in N2: previous: %d, this time: %d, iteration: %d', $pn2, $n2, $i) . PHP_EOL;
    }
    $pn2 = $n2;

    var_dump([$n1, $n2]);
}

I use gc_collect_cycles to urge PHP to collect the dangling Generator instance here. In reality, this happens "somewhere" down the line as more and more new object are allocated.

Expected output (with PHP 8.3 and thus MT_RAND_MT19937)

array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}

Actual output

array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(5714)
}
DELTA in N2: previous: 5714, this time: 3052, iteration: 1
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(3052)
}
DELTA in N2: previous: 3052, this time: 2717, iteration: 2
array(2) {
  [0]=>
  int(2429)
  [1]=>
  int(2717)
}
coajaxial commented 3 months ago

I have the exact same problem when using zenstruck/foundry! When creating around 50+ Fixtures using a createMany() call, the seeding started to break. It took me a while to find out it was the __destruct call from some previous Generator instances, that get created by either booting the kernel more then one time in a single test, but also because Foundry creates some Generator instances on the fly (I didn't investigate why). Thanks for the detailed bug report and I hope this will get fixed soon :)