PomeloFoundation / Pomelo.EntityFrameworkCore.MySql

Entity Framework Core provider for MySQL and MariaDB built on top of MySqlConnector
MIT License
2.69k stars 381 forks source link

Reevaluate DateTimeOffset support #819

Closed lauxjpn closed 5 years ago

lauxjpn commented 5 years ago

At this point in time, MySQL does not have any native support to store a DateTimeOffset (basically a DateTime + its offset from UTC).

The current provider implementation is not working [Update] with all tests[/Update]. On the one hand, it just returns a DateTime, where the offset is already applied. On the other hand, because EF expects a DateTimeOffset returned instead of a DateTime, an exception is thrown.

We should discuss the way to go from here. I think there are at least three possible directions:

  1. Remove the faulty DateTimeOffset implementation entirely. If the user wants to use a DateTimeOffset, he needs to implement his own ValueConverter on his model.
  2. Implement a proprietary representation of a DateTimeOffset on the MySQL storage side (e.g. 64 bit uint, with the highest 5 bits representing the offset, or a string representation of the date + offset).
  3. Keep the current (faulty) implementation alive by making sure, that the MySQL returned DateTime (with the offset already applied) is converted to a DateTimeOffset (with an offset of 0), before it gets returned to the caller/EF (which circumvents the current exception).

For the upcoming 2.2.6 PR, I implemented option 3, though this seems to be the worst one. It also (rightfully so) fails the corresponding tests.

crozone commented 5 years ago

Under what circumstances are you seeing an exception thrown for DateTimeOffset? We have been using DateTimeOffset our entities for over a year without issues. With a single exception (that I have already merged a PR for), even the migrations work in the same way as DateTime.

As to the lack of offset storage support, I believe this is expected and works as intended. MySQL has no column type that supports any timezone offset. This is the same for Postgresql. Pomelo handles DateTimeOffset the same way that Npgsql does, which is where the DateTimeOffset is converted to UTC 0 and stored in the database as DATETIME(6) with UTC 0. This is still highly advantageous compared to using DateTime since there is never any ambiguity as to what conversion will occur when going into the database. It has been the de-facto method for storing DateTimeOffsets in .NET ORMs for some time.

If you need to store the actual timezone that was set, this should be stored as a separate column. The DateTimeOffset can then be adjusted back to this timezone after it is retrieved.

Using the top 5 bits is non-standard and would break any query that expects the time to be a sane value, so I don't like option 2. Option 1 is cumbersome. As far as I was aware, option 3 is close to how it currently already operates, and I'm interested to see your failure case.

lauxjpn commented 5 years ago

The following GearsOfWarQueryMySqlTest tests (which are marked appropriately) fail:

        [ConditionalFact(Skip = "DateTimeOffset is mapped to DateTime, which gives different results than Linq To Objects if Offset term is non-zero")]
        public override void Where_datetimeoffset_hour_component()
        {
            base.Where_datetimeoffset_hour_component();
        }

        [ConditionalFact(Skip = "DateTimeOffset is mapped to DateTime, which gives different results than Linq To Objects if Offset term is non-zero")]
        public override void Where_datetimeoffset_minute_component()
        {
            base.Where_datetimeoffset_minute_component();
        }

With the following exception:

Pomelo.EntityFrameworkCore.MySql.FunctionalTests.Query.GearsOfWarQueryMySqlTest.Where_datetimeoffset_hour_component
   Source: GearsOfWarQueryMySqlTest.cs line: 95
   Duration: 362 ms

  Message: 
    Assert.Equal() Failure
    Expected: 1
    Actual:   0
  Stack Trace: 
    at TestHelpers.AssertResults[T](IList`1 expected, IList`1 actual, Func`2 elementSorter, Action`2 elementAsserter, Boolean verifyOrdered)
    at <AssertQuery>d__18`1.MoveNext()
    at --- End of stack trace from previous location where exception was thrown ---
    at TaskAwaiter.ThrowForNonSuccess(Task task)
    at TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
    at QueryTestBase`1.AssertQuery[TItem1](Func`2 query, Func`2 elementSorter, Action`2 elementAsserter, Boolean assertOrder, Int32 entryCount)
    at GearsOfWarQueryTestBase`1.Where_datetimeoffset_hour_component()
    at GearsOfWarQueryMySqlTest.Where_datetimeoffset_hour_component() in GearsOfWarQueryMySqlTest.cs line: 97

I understand, that this is a workaround, that is being used. I think it is worth discussing though, if this is actually appropriate or should be changed with a major release.

My main argument for option 1 would be, that the main reason to use DateTimeOffset in the first place, is to have a UTC date/time stored with an offset. If the offset isn't retained, why use a DateTimeOffset at all? It can only be appropriately used, if the developer is aware of the fact, that the offset isn't kept and if he is, he could use a DateTime in the first place. I am not arguing, that this isn't practical. It's just not the intended design of a DateTimeOffset.

Another solution could be, to provide a specific option to opt-in to for the developer, to request the non-standard behavior.

crozone commented 5 years ago

If the offset isn't retained, why use a DateTimeOffset at all?

DateTimeOffset still guarantees that the conversion will happen correctly when being saved into the database, and removes all ambiguity. Using just DateTime is more prone to mistakes when parsing dates and then saving them, because parsing a DateTime with a timezone leaves it as DateTimeKind.Unspecified. When saving this, the ORM doesn't know what conversion to do, so it assumes local time, which is incorrect. When it loads it back in, it will again covert back to local time, and suddenly the time is wrong. DateTimeOffset will always have the correct time, even if timezone information is lost.

Having DateTimeOffset lose the datetime information might not be ideal for some usecases. There are times when you don't actually care about the offset, and the offset's main utility is simply guaranteeing that the time is stored correctly relative to UTC 0.

If the original timezone needs to be stored, the correct way to do it is to store the TimeSpan Offset property in a separate column.

The real crux of the issue seems to be whether or not Pomelo should do this separation and recombination automatically under the hood, allowing DateTimeOffset to transparently store the offset.

If yes, possibly this could be enabled using some kind of shadow property, or make it opt-in by having the user specify an extra TimeSpan column under some naming convention (like DateTimeOffset CreatedTime; TimeSpan CreatedTimeOffset. Either way, I think this should be opt-in and not enabled by default, as to not break existing code. Also, as before, no other database adapters currently seem to handle this.

crozone commented 5 years ago

After some additional research, this scenario should already be supported by using these built in value converters:

DateTimeOffsetToBinaryConverter - DateTimeOffset to binary-encoded 64-bit value (stores it as a long, slight reduction in precision)

DateTimeOffsetToBytesConverter - DateTimeOffset to byte array (stores it as a 12 byte array, 8 bytes for time, 4 bytes for offset. Full precision.)

DateTimeOffsetToStringConverter - DateTimeOffset to string (ISO 8601 string including timezone)

These can be activated pretty easily:

builder.Entity<SomeEntity>().Property(se => se.SomeDateTimeOffsetProperty).HasConversion<DateTimeOffsetToBytesConverter>();

Source

I'm still investigating ways to have the database automatically store the offset as a separate column using a custom value converter on a shadow property, however this might be a waste of time given the above methods probably work well for most scenarios. EF doesn't really seem to be set up in a way that allows one property to be mapped to two columns very easily.

lauxjpn commented 5 years ago

Great find! I guess in this case, it's enough to just extend the wiki to document the DateTimeOffset behavior and mention the value converters.

The three value converters cover both: performance relevant scenarios and scenarios, in which the DateTimeOffset should be stored in a human readable form. And the fact, that these are provided by Microsoft, makes them less proprietary in the context of EF Core.

lauxjpn commented 5 years ago

Let's keep this open for a bit, as a reminder to recheck the current DateTimeOffset behavior. There were some suspicious test cases, I want to recheck before the GA release of 3.0.0.

crozone commented 5 years ago

After reviewing the code, I can't actually find anywhere where a conversion to universal time is being done before storage. I suspect this will also lead to differing behaviour depending on column type - TIMESTAMP apparently does a conversion from the database connection timezone into UTC 0 as it is saved. DATETIME apparently does not, it just saves the time as is and then returns it as is. In the DATETIME case, always converting to UTC 0 seems quite necessary.

bgrainger commented 5 years ago

I can't actually find anywhere where a conversion to universal time is being done before storage

In the underlying MySQL library, MySqlConnector, a conversion to UTC is performed if MySqlParameter.Value is a DateTimeOffset: https://github.com/mysql-net/MySqlConnector/blob/05df17a6f4e2f6c899598e00f0790badc7195296/src/MySqlConnector/MySql.Data.MySqlClient/MySqlParameter.cs#L332-L336

The value has to be serialized on the wire somehow, and MySqlConnector elects to coerce everything to UTC, since information will be lost (because MySQL can't store DateTimeOffset natively). The alternative would be throwing an exception.

lauxjpn commented 5 years ago

In the DATETIME case, always converting to UTC 0 seems quite necessary.

@crozone So our current implementation does not explicitly use DateTimeOffset.UtcDateTime and is probably just using local time then.

@bgrainger Thanks for the code reference. We should implement the same behavior for DateTimeOffset literals then to keep the behavior consistent.

We can start by adding a test to check for consistent behavior between queries with and without parameters and go from there to fix our current implementation.

crozone commented 5 years ago

@bgrainger Cheers for finding that reference. I've also found the conversion on the read for future reference too:

https://github.com/mysql-net/MySqlConnector/blob/9b8f3a4d8259144b52990d804f1dca303314289f/src/MySqlConnector/Core/Row.cs#L334

@lauxjpn Good catch with the literals, I wasn't aware that EF inlined constants like that.

bgrainger commented 5 years ago

See https://github.com/mysql-net/MySqlConnector/issues/172 and https://github.com/mysql-net/MySqlConnector/issues/175 for the background of the decisions around DateTimeOffset in MySqlConnector.

VictorioBerra commented 5 years ago

@crozone So our current implementation does not explicitly use DateTimeOffset.UtcDateTime and is probably just using local time then.

So, if someone wanted to use explicitly use DateTimeOffset.UtcDateTime on save, how would they do that?

lauxjpn commented 5 years ago

Using DateTimeOffset.UtcDateTime on save already is the default in 3.0.0. There is no need to use a value converter for that. We fixed the underlying issue with #845.

johnwc commented 2 years ago

I know this is a few years old, but I feel that it needs to be made clear that MySQL does know how to handle time zone information. It may not store it, but it DOES know how to handle it when it's given in INSERT and UPDATE statements. If the time zone is given, it will convert it to UTC. Where if it is not given, it assumes it is UTC. The example given below, shows that MySQL will store both inserts as different dates. With the second adjusted to UTC.

CREATE TABLE IF NOT EXISTS test_TZ (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    DTvalue DATETIME NOT NULL,
    TSvalue TIMESTAMP NOT NULL
);

INSERT INTO test_TZ (DTvalue, TSvalue) VALUES ('2021-11-12 10:30:00', '2021-11-12 10:30:00');
INSERT INTO test_TZ (DTvalue, TSValue) VALUES ('2021-11-12 10:30:00+06:00', '2021-11-12 10:30:00+06:00');
mguinness commented 2 years ago

There's an open issue Implement value conversions that spread out over multiple columns (value objects) in the EF Core repo that could be a possible solution. BTW, what you described was introduced in MySQL 8.0.19 with the following description:

MySQL now supports datetime literals with time zone offsets, such as '2019-12-11 10:40:30-05:00', '2003-04-14 03:30:00+10:00', and '2020-01-01 15:35:45+05:30'; these offsets are respected but not stored when inserting such values into TIMESTAMP and DATETIME columns; that is, offsets are not displayed when retrieving the values.

VictorioBerra commented 2 years ago

@johnwc That is helpful info, I would like to say that you are providing an offset. An offset is not a time zone https://spin.atomicobject.com/2016/07/06/time-zones-offsets/

johnwc commented 2 years ago

@VictorioBerra that is why I said time zone 'information'. As in, the offset comes from the time zone. Most of the time, when you're working with DateTimeOffset, you're working with time zone conversion. Always storing a date as UTC and then display it as a users local/preferred time zone. If you need to know what the original time zone was for the date, that's a hole different column to save. But... this is completely off topic of what we're talking about here in regards to DateTimeOffset.

That link reminds me of the grammer police... Everyone knows what the difference between a time zone and offset is, along with knowing a offset can derive from a time zone.

crozone commented 2 years ago

If the time zone is given, it will convert it to UTC. Where if it is not given, it assumes it is UTC.

This is inaccurate - if the timezone isn't given, it defaults to interpreting the given time in the timezone set in the current database connection context, which is set in the MySQL server configuration. There is no guarantee that this is UTC, on many MariaDB installs this is just the server host timezone. In order to know for sure what the connection timezone is, it either needs to be queried, or set with SET time_zone = '+00:00'.

Complicating things, the actual storage of the timezone differs depending on the column type. TIMESTAMP values are converted from their timezone into UTC upon storage, and converted back from UTC into the context timezone upon retrieval. DATETIME (and other time types) columns are not converted, and are simply stored verbatim as if their timezone values were stripped.

Ref: https://dev.mysql.com/doc/refman/8.0/en/datetime.html

MySQL converts TIMESTAMP values from the current time zone to UTC for storage, and back from UTC to the current time zone for retrieval. (This does not occur for other types such as DATETIME.) By default, the current time zone for each connection is the server's time. The time zone can be set on a per-connection basis. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions.

This difference in behaviour obviously presents some complications for the ORM. If storing DATETIME columns in UTC is desired for interoperability, they must be manually converted into UTC by the ORM. However, if this conversion is done for TIMESTAMP columns, the result will be incorrect unless the context timezone is manually set to UTC, or the timestamp includes the +00:00 timezone. Additionally, if CURRENT_TIMESTAMP is used to generate server-side default values, the connection timezone must be aligned. This is a non-issue for TIMESTAMP, since the stored value will always be UTC, but with DATETIME the connection timezone must be the same as whatever Pomelo is converting the values to, or they'll be misaligned.

johnwc commented 2 years ago

@crozone Sorry, I left out the important part about why it converts it to UTC for us. All of our MySQL server's have default-time-zone='+00:00' set in my.cnf.

lauxjpn commented 2 years ago

A general way to preserve the time zone information is to use a Value Converter like DateTimeOffsetToBinaryConverter for DateTimeOffset properties:

myEntity.Property(e => e.MyDateTimeOffsetProperty)
    .HasConversion<long>(); // <-- implicitly uses DateTimeOffsetToBinaryConverter

They are represented in the database as a 64 bit value.