XmlTextReaderImpl.Read() significant performance regression in net6.0

MarkPflug commented 2 years ago

Description

In adding support for .NET framework 4.8 to a library I maintain I was surprised to find that it was benchmarking significantly faster than the same code running on .NET 6.0. I narrowed the issue down to what appears to be a performance regression in XmlTextReaderImpl.Read() in .NET 6.0 specifically. The library implements a data reader for .xlsx files, and so requires unzipping and reading xml. I'd created baseline benchmarks that decompress and read the xml nodes to understand where my performance floor was:

[Benchmark]
public void ZipXml()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    var r = XmlReader.Create(sheetStream);
    while (r.Read()) ;
}

[Benchmark]
public void Zip()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    byte[] buffer = new byte[0x10000];
    while (sheetStream.Read(buffer, 0, buffer.Length) != 0) ;
}

The exact file I use is the .xlsx file here. The benchmark results of running these on my machine targeting various framework versions:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1266 (21H2)
Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores

framework	name	time	allocated
netcoreapp3.1	ZipXml	316.75 ms	248 KB
netcoreapp3.1	Zip	48.83 ms	84 KB
net5.0	ZipXml	304.80 ms	248 KB
net5.0	Zip	51.62 ms	84 KB
net6.0	ZipXml	573.91 ms	244 KB
net6.0	Zip	48.50 ms	84 KB
net48	ZipXml	287.37 ms	272 KB
net48	Zip	61.19 ms	87 KB

The ZipXml for net6.0 is an obvious outlier, almost twice as slow as the other framework versions. All of these are running x64, benchmark.net reporting X64 RyuJIT. I specifically call out XmlTextReaderImpl because that is the concrete implementation of Read that is being called on net6.0. No idea if that's true for other frameworks or not.

I have .NET 6.0 version 6.0.100-rc.1.21458.32 currently installed.

Hats off to net4.8, way to go champ.

area-System.Xml

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-xml See info in area-owners.md if you want to be subscribed.

Issue Details

### Description In adding support for .NET framework 4.8 to a library I maintain I was surprised to find that it was benchmarking significantly faster than the same code running on .NET 6.0. I narrowed the issue down to what appears to be a performance regression in `XmlTextReaderImpl.Read()` in .NET 6.0 specifically. The library implements a data reader for .xlsx files, and so requires unzipping and reading xml. I'd created baseline benchmarks that decompress and read the xml nodes to understand where my performance floor was: ``` [Benchmark] public void ZipXml() { using var stream = File.OpenRead(file); var zipArchive = new ZipArchive(stream); var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml"); using var sheetStream = sheetEntry.Open(); var r = XmlReader.Create(sheetStream); while (r.Read()) ; } [Benchmark] public void Zip() { using var stream = File.OpenRead(file); var zipArchive = new ZipArchive(stream); var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml"); using var sheetStream = sheetEntry.Open(); byte[] buffer = new byte[0x10000]; while (sheetStream.Read(buffer, 0, buffer.Length) != 0) ; } ``` [The exact file I use is the .xlsx file here.](https://github.com/MarkPflug/Benchmarks/tree/main/source/Benchmarks/Data) The benchmark results of running these on my machine targeting various framework versions: ``` BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1266 (21H2) Intel Core i7-7700K CPU 4.20GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores ``` | framework | name | time | allocated | |-------- |-----:|---:|----:| | netcoreapp3.1 | ZipXml | 316.75 ms | 248 KB | | netcoreapp3.1 | Zip | 48.83 ms | 84 KB | | net5.0 | ZipXml | 304.80 ms | 248 KB | | net5.0 | Zip | 51.62 ms | 84 KB | | net6.0 | ZipXml | 573.91 ms | 244 KB | | net6.0 | Zip | 48.50 ms | 84 KB | | net48 | ZipXml | 287.37 ms | 272 KB | | net48 | Zip | 61.19 ms | 87 KB | The ZipXml for net6.0 is an obvious outlier, almost twice as slow as the other framework versions. All of these are running x64, benchmark.net reporting `X64 RyuJIT`. I specifically call out `XmlTextReaderImpl` because that is the concrete implementation of `Read` that is being called on net6.0. No idea if that's true for other frameworks or not. I have .NET 6.0 version `6.0.100-rc.1.21458.32` currently installed. Hats off to net4.8, way to go champ. area-System.Xml

Author:	MarkPflug
Assignees:	-
Labels:	`area-System.Xml`, `tenet-performance`, `untriaged`
Milestone:	-

filipnavara commented 2 years ago

This is very typical but the culprit is not the XML reader but the ZIP decompression. It doesn't have the same internal buffering that it had on .NET Framework. Try wrapping the sheetStream in BufferedStream to see how it behaves. We observed the same thing with code in ML.NET.

MarkPflug commented 2 years ago

@filipnavara Good thought, but that doesn't seem to be related. I added two more benchmarks. The first was to explore your BufferedStream suggestion. The second was to remove the compression from the equation and read directly from the sheet1.xml.

[Benchmark]
public void ZipXmlBuffered()
{
    using var stream = File.OpenRead(file);
    var zipArchive = new ZipArchive(stream);
    var sheetEntry = zipArchive.GetEntry("xl/worksheets/sheet1.xml");
    using var sheetStream = sheetEntry.Open();
    using var bufferedStream = new BufferedStream(sheetStream, 0x10000);
    var r = XmlReader.Create(bufferedStream);
    while (r.Read()) ;
}

[Benchmark]
public void Xml()
{
    var r = XmlReader.Create("Data/sheet1.xml");
    while (r.Read()) ;
}

Running on .NET 6.0:	Method	Mean	Error	StdDev
Xml	523.35 ms	9.990 ms	1.546 ms	141 KB
ZipXml	568.54 ms	17.753 ms	6.331 ms	236 KB
ZipXmlBuffered	570.30 ms	24.874 ms	8.870 ms	305 KB
Zip	48.46 ms	0.860 ms	0.223 ms	76 KB

Same benchmark running against .NET 5.0:

Method	Mean	Error	StdDev	Allocated
Xml	265.45 ms	30.412 ms	10.845 ms	141 KB
ZipXml	300.84 ms	7.078 ms	1.838 ms	240 KB
ZipXmlBuffered	297.89 ms	15.315 ms	5.462 ms	304 KB
Zip	51.62 ms	0.718 ms	0.256 ms	76 KB

I tried other combinations of constructing the XmlReader with a FileStream with buffersize = 1 (which as I understand, disables buffering) with a 64k buffer, with a StreamReader with different buffer sizes. Nothing seems to make much of a difference. I'm happy to be wrong, but it really looks like the culprit remains in the XmlReader.Read method.

filipnavara commented 2 years ago

~~Can you share the source code of the benchmark please?~~ (Sorry, I accidentally scrolled over the part where you posted it.) For the record, this is the issue a member of our team filed for ML.NET. It is nearly identical scenario of XML within a ZIP file.

MarkPflug commented 2 years ago

To clarify one thing: this issue exhibits between .NET 5.0 and .NET 6.0. My mention of .NET framework was only because that is what brought it to my attention.

XmlReader.Read is the core XML API for .NET so I assume it should be considered performance-critical. This performance regression can also be seen in APIs such as XmlDocument.Load for example, which for the same file goes from 1,649ms on .NET 5.0 to 1,893ms on .NET 6.0.

MarkPflug commented 2 years ago

Looking at the change history on XmlTextReaderImpl, there appear to be two significant changes in the last year (roughly since 5.0). Both were to cleanup unsafe code.

The first one appears to remove unsafe code from the XmlCharType, which is used to categorize characters for the parser. I could easily see this one causing a performance regression, as it seems to be a core part of the parser. However, it looks like great care was taken to preserve the performance of the original. https://github.com/dotnet/runtime/commit/f1f7681464bc8e6662e15242d3babca75c914565

The "unsafe" code:

//internal byte* charProperties;
charProperties[str[i]]

Was replaced with "Unsafe" code:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static uint GetCharProperties(char ch) => Unsafe.Add(ref MemoryMarshal.GetReference(CharProperties), ch);

I think this "Unsafe" version is attempting to avoid the bounds check as that of the original pointer version.

This second change removes unsafe code from AdjustLineInfo.

https://github.com/dotnet/runtime/commit/997174511ab98c4fa409912db17ce18e7116b126

It doesn't appear to make any great effort to avoid bounds checks, but I don't know how important that is, as I don't know how often it's called. If it's called frequently, perhaps its enough to show the performance degredation?

krwq commented 2 years ago

@MarkPflug can you try following:

read file into MemoryStream first (i.e. new MemoryStream(File.ReadAllBytes(...)))
start measuring after stream is prepared

if that will still produce the same bad results I'll take a look and possibly revert that change. I can't promise it will make it to 6.0

MarkPflug commented 2 years ago

@krwq I added two more benchmarks, one with MemoryStream as you requested, and one that uses a StringReader in an attempt to remove yet another layer of IO abstraction:

        [Benchmark]
        public void XmlMemoryStream()
        {
            // memStream contains the file bytes
            memStream.Seek(0, SeekOrigin.Begin);
            var r = XmlReader.Create(memStream);
            while (r.Read()) ;
        }

        [Benchmark]
        public void XmlStringReader()
        {
            // xml is the string from File.ReadAllText
            var ms = new StringReader(xml);
            var r = XmlReader.Create(ms);
            while (r.Read()) ;
        }

This doesn't change the performance significantly. Results:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Xml	511.39 ms	8.539 ms	1.321 ms	-	-	-	348 KB
XmlMemoryStream	517.00 ms	22.310 ms	7.956 ms	-	-	-	140 KB
XmlStringReader	505.42 ms	5.684 ms	1.476 ms	-	-	-	155 KB
XmlLoad	1,797.19 ms	63.593 ms	16.515 ms	73000.0000	27000.0000	4000.0000	422,857 KB
ZipXml	559.06 ms	6.883 ms	0.377 ms	-	-	-	236 KB
ZipXmlBuffered	565.74 ms	37.710 ms	13.448 ms	-	-	-	305 KB
Zip	48.02 ms	0.539 ms	0.083 ms	-	-	-	76 KB

krwq commented 2 years ago

@MarkPflug I'm seeing that when they did https://github.com/dotnet/runtime/pull/41756 they've run some benchmarks and there wasn't any significant difference. I'm curious, could you try going back with git into state where this PR was merged and run the benchmark against your XML file? We should figure out if there was some other change which could have regressed this or if there is something specific in your XML which they've missed out when testing. Either way we should permanently add this benchmark and improve this back

MarkPflug commented 2 years ago

Uh, I've never built the runtime before, so I don't really know what I need to do. I pulled the repo and ran build.cmd. This took quite a while, but seemed to finish successfully. I then checked out a commit prior to the the #41756 version and ran build.cmd again. This time it failed with:

EXEC : error : DIA SDK is missing at "C:\Program Files\Microsoft Visual Studio\2022\Preview\DIA SDK". Did you install all the requirements for building on Windows, including the "Desktop Development with C++" workload?

I don't really think I'm going to be able to diagnose this.

eiriktsarpalis commented 2 years ago

@MarkPflug please see https://github.com/dotnet/runtime/blob/main/docs/workflow/requirements/windows-requirements.md for windows build prerequisites and instructions to build.

bgrainger commented 2 years ago

I'm not able to reproduce this regression on my machine; in fact, .NET 6.0 is fastest.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1348 (21H2)
Intel Core i7-10875H CPU 2.30GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-VFTWLF : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
  Job-EVXDCZ : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  Job-DHEGHK : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Ratio	RatioSD
ZipXml	Job-VFTWLF	.NET 5.0	net5.0	316.1 ms	4.77 ms	8.61 ms	1.00	0.00
ZipXml	Job-EVXDCZ	.NET 6.0	net6.0	282.0 ms	5.63 ms	10.15 ms	0.89	0.03
ZipXml	Job-DHEGHK	.NET Framework 4.8	net48	332.6 ms	2.78 ms	2.32 ms	1.03	0.04

XmlStringReader	Job-VFTWLF	.NET 5.0	net5.0	241.8 ms	1.95 ms	1.63 ms	1.00	0.00
XmlStringReader	Job-EVXDCZ	.NET 6.0	net6.0	205.5 ms	3.91 ms	4.95 ms	0.86	0.02
XmlStringReader	Job-DHEGHK	.NET Framework 4.8	net48	235.8 ms	3.35 ms	2.97 ms	0.98	0.01

MarkPflug commented 2 years ago

@bgrainger is correct, the issue doesn't exist in 6.0. I figured out why my measurements were wrong: "COMPlus_TieredCompilation" was set to "0" on my machine. I don't remember ever setting that, but I guess I must have. It seems that this particular xml code path is greatly affected by that setting in 6.0, whereas it wasn't in previous versions. Anyway, deleting that env var fixes this "regression". Whoops, sorry for wasting everyone's time.

dotnet / runtime

XmlTextReaderImpl.Read() significant performance regression in net6.0 #60226

Description