bchavez / Bogus

:card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.
Other
8.81k stars 500 forks source link

Bogus can generate impossible DateTime values #365

Open logiclrd opened 3 years ago

logiclrd commented 3 years ago

Version Information

Software Version(s)
Bogus NuGet Package 31.0.3, 33.0.2
.NET Core?
.NET Full Framework? 4.8
Windows OS? 10
Linux OS?
Visual Studio? 16.9.0 Preview 5

What locale are you using with Bogus?

en-US

What is the expected behavior?

Generated DateTime values always represent realistic dummy data.

What is the actual behavior?

If the range of accepted values contains a Daylight Savings Time shift forward, sometimes Bogus can return impossible DateTime values. For instance, in my locale, dates between 2021-03-14 02:00:00 and 2021-03-14 02:59:59.9999999 don't normally exist, and supplying such values to formatting and serialization functions can cause unexpected behaviour. Bogus has caused automated tests to fail by returning such values for dummy data.

Please provide a stack trace.

n/a

Any possible solutions?

Detect and correct invalid values at every site that asks Bogus for a DateTime value. But, sometimes these sites are buried inside other libraries (e.g. AutoBogus).

How do you reproduce the ?

Ensure that the range requested includes impossible DateTime values and request enough values and eventually you'll get an impossible value.

One way in which these invalid values can cause real-world problems is in serialization. For instance, if an invalid value is put into a DataTable object, then when that DataTable is serialized and then deserialized, the deserialized copy will have a different DateTime value. In my usage so far, this is mostly a problem in automated testing, and automated testing is where Bogus is used as well.

Do you have a unit test that can demonstrate the bug?

            var faker = new Faker();

            const int SampleCount = 1_000_000;

            int badSampleCount = 0;

            var firstInvalidDateTime = new DateTime(2021, 3, 14, 2, 0, 0);
            var firstValidDateTime = new DateTime(2021, 3, 14, 3, 0, 0);

            for (int i = 0; i < SampleCount; i++)
            {
                var value = faker.Date.Between(new DateTime(2021, 3, 14), new DateTime(2021, 3, 15));

                if ((value >= firstInvalidDateTime) && (value < firstValidDateTime))
                    badSampleCount++;
            }

            Console.WriteLine("Out of {0} samples, {1} were invalid ({2} %) for CDT", SampleCount, badSampleCount, badSampleCount * 100 / SampleCount);

Can you identify the location in Bogus' source code where the problem exists?

Any of the functions that return bogus DateTime values are susceptible if their range of values can include the spring DST switch in a time zone that does daylight savings. I believe these functions are all contained within the Date.cs Bogus data set type.

A complicating factor is that these values are invalid if the returned value is assumed to be in the local timezone and the local timezone includes a daylight savings switch. But, what if the caller wants DateTime values for a different timezone that does not have a daylight savings switch?

Perhaps it would be a reasonable assumption that the basic overloads should all return valid local DateTime values, and possibly consider adding UTC overloads, and possibly overloads where the caller can specify a TimeZoneInfo.

If the bug is confirmed, would you be willing to submit a PR?

If the desired path forward is clearly identified, I could probably make a PR.

logiclrd commented 3 years ago

I wonder if this would work: Always convert the start & end DateTime values to UTC, generate the bogus value in that range as a UTC DateTime, and then convert that back to a local DateTime before returning it.

logiclrd commented 3 years ago

I have made a PR that doesn't add any new overloads or functionality, and simply converts the date range to UTC before calculating a random value, then converts the random value back afterward, taking advantage of the framework's built-in DST transition calculations. I created a unit test that fails with the old code and succeeds with the new code and I'm pretty sure is in fact testing the intended functionality. That's in #366.

bchavez commented 3 years ago

Hi @logiclrd, thank you for the PR. I'll try to review the changes this weekend.