bchavez / Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Other
8.62k stars 491 forks source link

Using local seed with new properties on POCO updates other properties when it shouldn't #457

Closed sander1095 closed 1 year ago

sander1095 commented 1 year ago

Version Information

Software Version(s)
Bogus NuGet Package 34.0.2
.NET Core? .NET 6
.NET Full Framework?
Windows OS? 11
Linux OS?
Visual Studio? 17.2.1

What locale are you using with Bogus?

en

What is the expected behavior?

The new Description property should get new values and existing properties should keep the same values when generating a list of fake data. I expect this behaviour when I am following the Readme about using local seed data and determinism to have existing properties keep the same data while generating new data for new properties:

What is the actual behavior?

The data gets re-generated with values for the new Description property. The first item keeps the same values for its Name and CreationDate properties (which is good and expected 👍) but the other generated fake data generates new data for the Name and CreationDate properties, even though I follow each step of the Readme precisely. See the screenshot below for a more visualized example of what I mean.

Please provide a stack trace.

There is none because there isn't an exception.

Any possible solutions?

None that I can think of.

How do you reproduce the issue?

Check out my specific branch here: https://github.com/sander1095/bogus-efcore/tree/bogus-determinism-bug-report

In commit f49fa33502cf11a33c81b97cc354ebea4705e4d2 I added the seed data migration which generated lots of data for my Product class and other classes.

In the next commit 6ad0b8c39072118d56bf2020ecb6a9aa29972be5 I added the Description column to my Product class, EF Core configuration and the Product Faker configuration as the final line:

https://github.com/sander1095/bogus-efcore/blob/bogus-determinism-bug-report/BogusContext.cs#LL37C8-L37C8

var productId = 1;
var productFaker = new Faker<Product>()
    .RuleFor(x => x.Id, f => productId++)
    .RuleFor(x => x.Name, f => f.Commerce.ProductName())
    .RuleFor(x => x.CreationDate, f => f.Date.FutureOffset(refDate: new DateTimeOffset(2023, 1, 16, 15, 15, 0, TimeSpan.FromHours(1))))
    .RuleFor(x => x.Description, f => f.Commerce.ProductDescription()); // This is added to the last line, and we are using a local seed (1338) so it shouldn't change the Name and CreationDate properties

var products = productFaker.UseSeed(1338).Generate(1000);

In the final commit 8c4abf80398cbee807fc52b28b58f15a72b3b76b I simply ran dotnet ef migrations add AddDescriptionColumn to update existing data, expecting only the new Description property to be updated with new values.

But as you can see, it doesn't only insert new values for the Description property, it also updates the existing Name and CreationDate properties, but only for the 2nd until the last item, which is quite odd. The first item is done correctly!

afbeelding

Reproducing locally

To reproduce the results of the migration locally, do the following:

Do you have a unit test that can demonstrate the bug?

No, the code, links and images above should be sufficient to demonstrate

Can you identify the location in Bogus' source code where the problem exists?

No.

If the bug is confirmed, would you be willing to submit a PR?

Yes

sander1095 commented 1 year ago

I took some time today to make this issue easier to reproduce by not requiring EF Core.

Run this code and you will see that the first item is generated equally by both faker instances, but all other instances will generate difference results:

https://dotnetfiddle.net/2vfSmr

If that link doesn't work, run this code with .NET 6 and Bogus 34.0.2

// See https://aka.ms/new-console-template for more information
using Bogus;
using System;
using System.Linq;

var itemsToGenerate = 1000;

var productFaker = new Faker<Product>()
    .StrictMode(false)
    .RuleFor(x => x.Name, x => x.Commerce.ProductName())
    .RuleFor(x => x.CreationDate, x => x.Date.FutureOffset(refDate: new DateTimeOffset(2023, 1, 1, 1, 1, 0, TimeSpan.Zero)));

var secondProductFaker = new Faker<Product>()
    .StrictMode(false)
    .RuleFor(x => x.Name, x => x.Commerce.ProductName())
    .RuleFor(x => x.CreationDate, x => x.Date.FutureOffset(refDate: new DateTimeOffset(2023, 1, 1, 1, 1, 0, TimeSpan.Zero)))
    .RuleFor(x => x.Description, x => x.Commerce.ProductDescription());

// Expectation:
// Both fakers should generate the same name and creationDate.
// Description is only added at the end of the 2nd faker.
// According to according to Bogus's README.md this should not change existing data when using a local seed.
var firstRun =  /*  */productFaker.UseSeed(1338).Generate(itemsToGenerate).Select(x => new { x.Name, x.CreationDate }).ToList();
var secondRun = secondProductFaker.UseSeed(1338).Generate(itemsToGenerate).Select(x => new { x.Name, x.CreationDate }).ToList();

// Reality:
// The first item is the same, all other ones are different
var successes = 0;
var failures = 0;
for (int i = 0; i < itemsToGenerate; i++)
{
    var firstRunItem = firstRun[i];
    var secondRunItem = secondRun[i];

    if (firstRunItem.ToString() == secondRunItem.ToString())
    {
        successes++;
    }
    else
    {
        failures++;
    }
}
Console.WriteLine($"Successes: {successes}");
Console.WriteLine($"Failures: {failures}");

Console.ReadKey();

class Product
{
    public string Name { get; set; } = null!;

    public DateTimeOffset CreationDate { get; set; }
    public string Description { get; set; } = null!;

    public override string ToString() => $"{Name}, {CreationDate}, {Description}";
}

Funnily enough, if you were to change secondProductFaker like this:

var secondProductFaker = productFaker.RuleFor(x => x.Description, x => x.Commerce.ProductDescription());

We end up with 1000 successes.

This is sadly not good enough for me, because as you can see in my bug report, I use this with EF Core between runs, so I can't re-use an existing Faker instance.

bchavez commented 1 year ago

Hi @sander1095,

Thank you for your post. The main problem you're running into is that you're only applying the .UseSeed(1338) once for the entire generation.

The way you should think about it is "setting the seed value for the entire row". To illustrate:

image

And in other words, consider the following code in LinqPad 7:

void Main()
{
   var itemsToGenerate = 10;

   var productFaker = new Faker<Product>()
       .StrictMode(false)
       .RuleFor(x => x.Name, x => x.Commerce.ProductName())
       .RuleFor(x => x.CreationDate, x => x.Date.FutureOffset(refDate: new DateTimeOffset(2023, 1, 1, 1, 1, 0, TimeSpan.Zero)));

   var secondProductFaker = new Faker<Product>()
       .StrictMode(false)
       .RuleFor(x => x.Name, x => x.Commerce.ProductName())
       .RuleFor(x => x.CreationDate, x => x.Date.FutureOffset(refDate: new DateTimeOffset(2023, 1, 1, 1, 1, 0, TimeSpan.Zero)))
       .RuleFor(x => x.Description, x => x.Commerce.ProductDescription());

   // Expectation:
   // Both fakers should generate the same name and creationDate.
   // Description is only added at the end of the 2nd faker.
   T SeedRow<T>(Faker<T> faker, int rowId) where T : class
   {
      var recordRow = faker.UseSeed(rowId).Generate();
      return recordRow;
   };

   var firstRun = Enumerable.Range(1, itemsToGenerate)
         .Select(i => SeedRow(productFaker, i))
         .Select(x => new {x.Name, x.CreationDate})
         .ToList();

   var secondRun = Enumerable.Range(1, itemsToGenerate)
         .Select(i => SeedRow(secondProductFaker, i))
         .Select(x => new { x.Name, x.CreationDate })
         .ToList();

   // Reality:
   // The first run is the same as the second run
   var successes = 0;
   var failures = 0;
   for (int i = 0; i < itemsToGenerate; i++)
   {
      var firstRunItem = firstRun[i];
      var secondRunItem = secondRun[i];

      if (firstRunItem.ToString() == secondRunItem.ToString())
      {
         successes++;
      }
      else
      {
         failures++;
         Util.Dif(firstRunItem, secondRunItem).Dump();
      }
   }
   Console.WriteLine($"Successes: {successes}");
   Console.WriteLine($"Failures: {failures}");
}

class Product
{
   public string Name { get; set; } = null!;

   public DateTimeOffset CreationDate { get; set; }
   public string Description { get; set; } = null!;

   public override string ToString() => $"{Name}, {CreationDate}, {Description}";
}

image

Also, thank you for creating a simplified example. That always helps a lot.

Hope this helps. I'm going to close this issue now, but feel free to carry on the conversation if you have any more questions or feedback and I'll try to help!

Thanks, Brian Chavez