dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.12k stars 4.7k forks source link

DateTime is serialized as string instead of binary by DataContractSerializer with XmlDictionaryWrite.CreateBinaryWriter #61528

Open sschmieta-qontigo opened 2 years ago

sschmieta-qontigo commented 2 years ago

Description

The WCF binary XML format written by XmlDictionaryWriter.CreateBinaryWriter allows for a binary representation of DateTime objects that is shorter and faster to deserialize than the standard string presentation, e.g. 2021-01-01T00:00:00. The .NET Framework implementation uses the binary format everywhere except in XML attributes. The .NET Core implementation, on the other hand, uses it when serializing DateTime arrays (DataTime[]), but not for lists or object members, e.g. List<DateTime>. That leads to much larger blobs when the serialized objects contain many DateTimes and, more importantly, significantly longer deserialization times.

Reproduction Steps

The following unit test that can be added to src/libraries/System.Runtime.Serialization.Xml/tests/DataContractSerializer.cs demonstrates the issue by binary-serializing a single DateTime object.

    [Fact]
    public static void DCS_BinarySerializationOfDateTime()
    {
        DateTime dateTime = DateTime.Parse("2021-01-01");
        MemoryStream ms = new();
        DataContractSerializer dcs = new(dateTime.GetType());
        using (XmlDictionaryWriter writer = XmlDictionaryWriter.CreateBinaryWriter(ms, null, null, ownsStream: true))
            dcs.WriteObject(writer, dateTime);
        var serializedBytes = ms.ToArray();
        Assert.Equal(72, serializedBytes.Length);
    }

Expected behavior

The binary serialization should be 72 bytes long, matching .NET Framework 4.8, and the above unit test should pass.

Actual behavior

The binary serialization is 84 bytes long, using the textual representation of DateTime, and the unit test fails.

Regression?

Yes, this works in .NET Framework 4.7.2 and 4.8. Probably earlier versions as well. The bug has been presented in all versions of .NET Core as far as I can tell.

Known Workarounds

No response

Configuration

Confirmed in .NET Core 3.1 and .NET 5.0 on x64 Windows. By code inspection the same issue is present in .NET 6.0 and also versions before 3.1. As far as I can tell, the issue is not specific to these configurations.

Other information

The bug has been presented in all versions of .NET Core as far as I can tell. The initial import in the "runtime" git repo has the defect. I traced it to a difference in the implementation of writeDateTime(DateTime value) in src/libraries/System.Private.DataContractSerialization/src/System/Runtime/Serialization/XmlWriterDelegator.cs. The .NET Core version is

         internal virtual void WriteDateTime(DateTime value)
         {
             WriteString(XmlConvert.ToString(value, XmlDateTimeSerializationMode.RoundtripKind));
         }

which forces the use of the textual DateTime representation. The .NET Framework sources have

         internal virtual void WriteDateTime(DateTime value)
         {
            writer.WriteValue(value);
         }

which delegate to the inner writer object which will use the binary representation in most cases.

sschmieta-qontigo commented 2 years ago

@StephenMolloy , any update on the schedule for this ticket? Which 6.0.x release will include the fix?

sschmieta-qontigo commented 2 years ago

Can this please be included in the next 6.0.x release.?The objects we are working with are time-series of yield curves and volatility cubes which contain tens of thousands of DateTime objects. This bug is causing deserialization to take much longer in .NET vs .NET Framework and is causing a serious performance regression for our Azure compute grid when run under .NET Core.

danmoseley commented 2 years ago

Cc @HongGit @StephenMolloy for comment above.

StephenMolloy commented 2 years ago

I believe this was fixed with our alignment of DCS & friends with .Net 4.8 in net7.0-rc1 with #71752.

Since this was not a regression from a previous version of .Net Core, it's not a slam dunk that we'll be able to service this issue in 6.0.x. But the fix here is a very small part of #71752 and relatively low risk. So with vocal customer and business justification, we would like to try to bring it into 6.0 servicing. Can folks tell us more about how this impacts them?

sschmieta-qontigo commented 2 years ago

@StephenMolloy, as I mentioned in the initial bug statement, this is a regression against .NET Framework 4.7.1 that has been present in all .NET Core versions so far. It is causing a serious performance regression when running a .NET Core version of our Azure compute grid which is how we found this bug. We are currently targeting the 6.0 LTS version for our port (up from 3.1 when I filed the bug), so a fix in 6.0.x would be much appreciated.