dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.65k forks source link

XmlTextReaderImpl.Read() significant performance regression in net6.0 #60226

Closed MarkPflug closed 2 years ago

MarkPflug commented 2 years ago

Description

In adding support for .NET framework 4.8 to a library I maintain I was surprised to find that it was benchmarking significantly faster than the same code running on .NET 6.0. I narrowed the issue down to what appears to be a performance regression in XmlTextReaderImpl.Read() in .NET 6.0 specifically. The library implements a data reader for .xlsx files, and so requires unzipping and reading xml. I'd created baseline benchmarks that decompress and read the xml nodes to understand where my performance floor was:

[Benchmark]
public void ZipXml()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    var r = XmlReader.Create(sheetStream);
    while (r.Read()) ;
}

[Benchmark]
public void Zip()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    byte[] buffer = new byte[0x10000];
    while (sheetStream.Read(buffer, 0, buffer.Length) != 0) ;
}

The exact file I use is the .xlsx file here. The benchmark results of running these on my machine targeting various framework versions:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1266 (21H2)
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
framework name time allocated
netcoreapp3.1 ZipXml 316.75 ms 248 KB
netcoreapp3.1 Zip 48.83 ms 84 KB
net5.0 ZipXml 304.80 ms 248 KB
net5.0 Zip 51.62 ms 84 KB
net6.0 ZipXml 573.91 ms 244 KB
net6.0 Zip 48.50 ms 84 KB
net48 ZipXml 287.37 ms 272 KB
net48 Zip 61.19 ms 87 KB

The ZipXml for net6.0 is an obvious outlier, almost twice as slow as the other framework versions. All of these are running x64, benchmark.net reporting X64 RyuJIT. I specifically call out XmlTextReaderImpl because that is the concrete implementation of Read that is being called on net6.0. No idea if that's true for other frameworks or not.

I have .NET 6.0 version 6.0.100-rc.1.21458.32 currently installed.

Hats off to net4.8, way to go champ.

area-System.Xml

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-xml See info in area-owners.md if you want to be subscribed.

Issue Details
### Description In adding support for .NET framework 4.8 to a library I maintain I was surprised to find that it was benchmarking significantly faster than the same code running on .NET 6.0. I narrowed the issue down to what appears to be a performance regression in `XmlTextReaderImpl.Read()` in .NET 6.0 specifically. The library implements a data reader for .xlsx files, and so requires unzipping and reading xml. I'd created baseline benchmarks that decompress and read the xml nodes to understand where my performance floor was: ``` [Benchmark] public void ZipXml() { using var stream = File.OpenRead(file); var zipArchive = new ZipArchive(stream); var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml"); using var sheetStream = sheetEntry.Open(); var r = XmlReader.Create(sheetStream); while (r.Read()) ; } [Benchmark] public void Zip() { using var stream = File.OpenRead(file); var zipArchive = new ZipArchive(stream); var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml"); using var sheetStream = sheetEntry.Open(); byte[] buffer = new byte[0x10000]; while (sheetStream.Read(buffer, 0, buffer.Length) != 0) ; } ``` [The exact file I use is the .xlsx file here.](https://github.com/MarkPflug/Benchmarks/tree/main/source/Benchmarks/Data) The benchmark results of running these on my machine targeting various framework versions: ``` BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1266 (21H2) Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores ``` | framework | name | time | allocated | |-------- |-----:|---:|----:| | netcoreapp3.1 | ZipXml | 316.75 ms | 248 KB | | netcoreapp3.1 | Zip | 48.83 ms | 84 KB | | net5.0 | ZipXml | 304.80 ms | 248 KB | | net5.0 | Zip | 51.62 ms | 84 KB | | net6.0 | ZipXml | 573.91 ms | 244 KB | | net6.0 | Zip | 48.50 ms | 84 KB | | net48 | ZipXml | 287.37 ms | 272 KB | | net48 | Zip | 61.19 ms | 87 KB | The ZipXml for net6.0 is an obvious outlier, almost twice as slow as the other framework versions. All of these are running x64, benchmark.net reporting `X64 RyuJIT`. I specifically call out `XmlTextReaderImpl` because that is the concrete implementation of `Read` that is being called on net6.0. No idea if that's true for other frameworks or not. I have .NET 6.0 version `6.0.100-rc.1.21458.32` currently installed. Hats off to net4.8, way to go champ. area-System.Xml
Author: MarkPflug
Assignees: -
Labels: `area-System.Xml`, `tenet-performance`, `untriaged`
Milestone: -
filipnavara commented 2 years ago

This is very typical but the culprit is not the XML reader but the ZIP decompression. It doesn't have the same internal buffering that it had on .NET Framework. Try wrapping the sheetStream in BufferedStream to see how it behaves. We observed the same thing with code in ML.NET.

MarkPflug commented 2 years ago

@filipnavara Good thought, but that doesn't seem to be related. I added two more benchmarks. The first was to explore your BufferedStream suggestion. The second was to remove the compression from the equation and read directly from the sheet1.xml.

[Benchmark]
public void ZipXmlBuffered()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    using var bufferedStream = new BufferedStream(sheetStream, 0x10000);
    var r = XmlReader.Create(bufferedStream);
    while (r.Read()) ;
}

[Benchmark]
public void Xml()
{
    var r = XmlReader.Create("Data/sheet1.xml");
    while (r.Read()) ;
}
Running on .NET 6.0: Method Mean Error StdDev Allocated
Xml 523.35 ms 9.990 ms 1.546 ms 141 KB
ZipXml 568.54 ms 17.753 ms 6.331 ms 236 KB
ZipXmlBuffered 570.30 ms 24.874 ms 8.870 ms 305 KB
Zip 48.46 ms 0.860 ms 0.223 ms 76 KB

Same benchmark running against .NET 5.0:

Method Mean Error StdDev Allocated
Xml 265.45 ms 30.412 ms 10.845 ms 141 KB
ZipXml 300.84 ms 7.078 ms 1.838 ms 240 KB
ZipXmlBuffered 297.89 ms 15.315 ms 5.462 ms 304 KB
Zip 51.62 ms 0.718 ms 0.256 ms 76 KB

I tried other combinations of constructing the XmlReader with a FileStream with buffersize = 1 (which as I understand, disables buffering) with a 64k buffer, with a StreamReader with different buffer sizes. Nothing seems to make much of a difference. I'm happy to be wrong, but it really looks like the culprit remains in the XmlReader.Read method.

filipnavara commented 2 years ago

Can you share the source code of the benchmark please? (Sorry, I accidentally scrolled over the part where you posted it.) For the record, this is the issue a member of our team filed for ML.NET. It is nearly identical scenario of XML within a ZIP file.

MarkPflug commented 2 years ago

To clarify one thing: this issue exhibits between .NET 5.0 and .NET 6.0. My mention of .NET framework was only because that is what brought it to my attention.

XmlReader.Read is the core XML API for .NET so I assume it should be considered performance-critical. This performance regression can also be seen in APIs such as XmlDocument.Load for example, which for the same file goes from 1,649ms on .NET 5.0 to 1,893ms on .NET 6.0.

MarkPflug commented 2 years ago

Looking at the change history on XmlTextReaderImpl, there appear to be two significant changes in the last year (roughly since 5.0). Both were to cleanup unsafe code.

The first one appears to remove unsafe code from the XmlCharType, which is used to categorize characters for the parser. I could easily see this one causing a performance regression, as it seems to be a core part of the parser. However, it looks like great care was taken to preserve the performance of the original. https://github.com/dotnet/runtime/commit/f1f7681464bc8e6662e15242d3babca75c914565

The "unsafe" code:

//internal byte* charProperties;
charProperties[str[i]]

Was replaced with "Unsafe" code:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static uint GetCharProperties(char ch) => Unsafe.Add(ref MemoryMarshal.GetReference(CharProperties), ch);

I think this "Unsafe" version is attempting to avoid the bounds check as that of the original pointer version.

This second change removes unsafe code from AdjustLineInfo.

https://github.com/dotnet/runtime/commit/997174511ab98c4fa409912db17ce18e7116b126

It doesn't appear to make any great effort to avoid bounds checks, but I don't know how important that is, as I don't know how often it's called. If it's called frequently, perhaps its enough to show the performance degredation?

krwq commented 2 years ago

@MarkPflug can you try following:

if that will still produce the same bad results I'll take a look and possibly revert that change. I can't promise it will make it to 6.0

MarkPflug commented 2 years ago

@krwq I added two more benchmarks, one with MemoryStream as you requested, and one that uses a StringReader in an attempt to remove yet another layer of IO abstraction:

        [Benchmark]
        public void XmlMemoryStream()
        {
            // memStream contains the file bytes
            memStream.Seek(0, SeekOrigin.Begin);
            var r = XmlReader.Create(memStream);
            while (r.Read()) ;
        }

        [Benchmark]
        public void XmlStringReader()
        {
            // xml is the string from File.ReadAllText
            var ms = new StringReader(xml);
            var r = XmlReader.Create(ms);
            while (r.Read()) ;
        }

This doesn't change the performance significantly. Results:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Xml 511.39 ms 8.539 ms 1.321 ms - - - 348 KB
XmlMemoryStream 517.00 ms 22.310 ms 7.956 ms - - - 140 KB
XmlStringReader 505.42 ms 5.684 ms 1.476 ms - - - 155 KB
XmlLoad 1,797.19 ms 63.593 ms 16.515 ms 73000.0000 27000.0000 4000.0000 422,857 KB
ZipXml 559.06 ms 6.883 ms 0.377 ms - - - 236 KB
ZipXmlBuffered 565.74 ms 37.710 ms 13.448 ms - - - 305 KB
Zip 48.02 ms 0.539 ms 0.083 ms - - - 76 KB
krwq commented 2 years ago

@MarkPflug I'm seeing that when they did https://github.com/dotnet/runtime/pull/41756 they've run some benchmarks and there wasn't any significant difference. I'm curious, could you try going back with git into state where this PR was merged and run the benchmark against your XML file? We should figure out if there was some other change which could have regressed this or if there is something specific in your XML which they've missed out when testing. Either way we should permanently add this benchmark and improve this back

MarkPflug commented 2 years ago

Uh, I've never built the runtime before, so I don't really know what I need to do. I pulled the repo and ran build.cmd. This took quite a while, but seemed to finish successfully. I then checked out a commit prior to the the #41756 version and ran build.cmd again. This time it failed with:

EXEC : error : DIA SDK is missing at "C:\Program Files\Microsoft Visual Studio\2022\Preview\DIA SDK". Did you install all the requirements for building on Windows, including the "Desktop Development with C++" workload?

I don't really think I'm going to be able to diagnose this.

eiriktsarpalis commented 2 years ago

@MarkPflug please see https://github.com/dotnet/runtime/blob/main/docs/workflow/requirements/windows-requirements.md for windows build prerequisites and instructions to build.

bgrainger commented 2 years ago

I'm not able to reproduce this regression on my machine; in fact, .NET 6.0 is fastest.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i7-10875H CPU 2.30GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-VFTWLF : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
  Job-EVXDCZ : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-DHEGHK : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
Method Job Runtime Toolchain Mean Error StdDev Ratio RatioSD
ZipXml Job-VFTWLF .NET 5.0 net5.0 316.1 ms 4.77 ms 8.61 ms 1.00 0.00
ZipXml Job-EVXDCZ .NET 6.0 net6.0 282.0 ms 5.63 ms 10.15 ms 0.89 0.03
ZipXml Job-DHEGHK .NET Framework 4.8 net48 332.6 ms 2.78 ms 2.32 ms 1.03 0.04
XmlStringReader Job-VFTWLF .NET 5.0 net5.0 241.8 ms 1.95 ms 1.63 ms 1.00 0.00
XmlStringReader Job-EVXDCZ .NET 6.0 net6.0 205.5 ms 3.91 ms 4.95 ms 0.86 0.02
XmlStringReader Job-DHEGHK .NET Framework 4.8 net48 235.8 ms 3.35 ms 2.97 ms 0.98 0.01
MarkPflug commented 2 years ago

@bgrainger is correct, the issue doesn't exist in 6.0. I figured out why my measurements were wrong: "COMPlus_TieredCompilation" was set to "0" on my machine. I don't remember ever setting that, but I guess I must have. It seems that this particular xml code path is greatly affected by that setting in 6.0, whereas it wasn't in previous versions. Anyway, deleting that env var fixes this "regression". Whoops, sorry for wasting everyone's time.