Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
743 stars 495 forks source link

Default serialization settings do not handle DateTime values according to Cosmos documented recommendations #4904

Open DaRosenberg opened 1 day ago

DaRosenberg commented 1 day ago

Describe the bug

Serializing a DateTime into the ISO 8601 format used by default in Newtonsoft.Json, or the extended ISO 8601-1:2019 profile used by default in System.Text.Json, omits fraction digits if the fraction is 0. When time zone information is included in the DateTime being serialized this leads to unexpected results when querying because the resulting string values cannot be correctly lexicographically compared. For example 2024-11-23T12:30:00Z will be considered greater than 2024-11-23T12:30:00.123Z in Cosmos queries, even though when compared as actual DateTime values, the opposite is in fact true.

For this reason the Cosmos documentation recommends always serializing DateTime values in full round-trip format: https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/working-with-dates#storing-datetimes

Serializing into full round-trip format (.NET standard format string "O") preserves exactly 7 fraction digits even if the fraction is exactly 0. This solves the correctness problem by making all serialized DateTime values lexicographically comparable. In addition, it retains compatibility with the default ISO 8601 formats, because those allow for the fraction digits to be present even if they are all 0.

Despite this recommendation the default configuration of the Cosmos SDK for .NET (regardless of which serializer is used) makes no effort to configure for this guideline.

Expected behavior

I would expect the official Microsoft-provided .NET SDK for Cosmos DB to have default configuration that is compliant with the Cosmos recommendations.

Actual behavior

Instead, the SDK makes has no such configuration, and instead chooses to rely on the default behavior of the supported serializers, which leads to incorrect results that can be very hard to discover.

Workaround

To work around the issue, this converter can be used with the System.Text.Json serializer to ensure that serialization uses the full round-trip format. Note that conversion is only done on serialization - not during deserialization. This is because the round-trip format is compliant with the extended ISO 8601-1:2019 profile used by default. The default deserialization behavior parses up to 7 fraction digits, even if they are all zero.

public class RoundTripDateTimeConverter : JsonConverter<DateTime>
{
    public override DateTime Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
    {
        return reader.GetDateTime();
    }

    public override void Write(Utf8JsonWriter writer, DateTime value, JsonSerializerOptions options)
    {
        // The "O" standard format will always be between 27 and 33 characters depending on DateTime kind and time zone.
        Span<byte> utf8Date = new byte[33];

        if (!Utf8Formatter.TryFormat(value, utf8Date, out var numBytes, new StandardFormat('O')))
        {
            throw new FormatException("Failed to format the input DateTime value into a round-trip DateTime string.");
        }

        writer.WriteStringValue(utf8Date[..numBytes]);
    }
}

Environment summary SDK Version: 3.46.0

cliedeman commented 15 hours ago

Also investigating this while starting a new project using cosmos with the Ef Core Provider The Ef core formatting only use the Z style when DateTime.Kind = Utc.

From the docs Image

I have the following document

{
    "id": "1",
    "t1": "2021-01-01T02:00:00+02:00",
    "t2": "2021-01-01T02:00:00Z",
    "_rid": "TDIIALfNC78BAAAAAAAAAA==",
    "_self": "dbs/TDIIAA==/colls/TDIIALfNC78=/docs/TDIIALfNC78BAAAAAAAAAA==/",
    "_etag": "\"77033b18-0000-0c00-0000-674373760000\"",
    "_attachments": "attachments/",
    "_ts": 1732473718
}

but this query succeeds

SELECT * FROM c where c.t1 < '2021-01-01T02:01:00+02:00' and c.t1 > '2021-01-01T01:59:00+02:00'

which I fully expect not to work? So its possible those docs are out of date. Ef core will also omit the fraction digits so this issue is relevant for the Efcore Cosmos Provider too

DaRosenberg commented 11 hours ago

@cliedeman That query will succeed because the minutes are different, so even lexicographically, the comparison will never reach the point in the string where the presence or absence of fraction digits makes a difference.