Regressions in System.Xml.Linq.Perf_XName (FullPGO)

performanceautofiler[bot] commented 2 years ago

Run Information

Architecture	x64
OS	Windows 10.0.18362
Baseline	215c328e039b3a6b6bfdc6514253307105f25816
Compare	ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff	Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio	Baseline ETL	Compare ETL
[NonEmptyNameSpaceToString - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362_PGOType=fullpgo/System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString.html>)	19.80 ns	24.26 ns	1.22	0.18	False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones See info in area-owners.md if you want to be subscribed.

Issue Details

### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`area-System.Security`, `untriaged`
Milestone:	-

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-xml See info in area-owners.md if you want to be subscribed.

Issue Details

### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`area-System.Xml`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`, `refs/heads/main`, `x64`, `Regression`, `RunKind=micro`, `CoreClr`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone:	-

krwq commented 2 years ago

@adamsitnik this seems a bit like a noise - looking at graph my gut tells me this is well within stdev, any clues?

adamsitnik commented 2 years ago

@krwq that was my initial impression as well, but then I've clicked on "Toggle annotations" (1), the blue line (2) and moving average (3) and it's rather an actual regression:

eiriktsarpalis commented 2 years ago

Performance for the particular benchmarks appears to have regressed substantially over the recent months:

newplot

I don't believe we've made any change to Xml over that time, codegen related perhaps?

krwq commented 2 years ago

My initial perf results also confirm that we got slightly slower each time: net6.0 -> net7.0 preview 1 -> net7.0 preview 7.

The PerfView showes that that mostly string.Concat and coreclr contribute to that test. Looking at the results of all 3 I'm seeing we spend less time in string.Concat over time but more time in coreclr.

PerfView doesn't show me exactly where in coreclr we're spending more time (or I'm not sure how to find that out). I'll check with @adamsitnik to see if he has more clues how to get more grained results to figure out where to redirect this issue.

This is the app I've been measuring on:

using System;
using System.Diagnostics;
using System.Xml.Linq;

public class Program
{
    public static void Main()
    {
        XName hasNamespace = XName.Get("{http://www.example.test}Root");

        Stopwatch sw = Stopwatch.StartNew();
        const int n = 500000000;
        for (int i = 0; i < n; i++)
        {
            hasNamespace.ToString();
        }
        sw.Stop();

        Console.WriteLine($"{sw.ElapsedMilliseconds} ms");
    }
}

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	krwq
Labels:	`area-System.Xml`, `blocking-release`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `runtime-coreclr`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone:	7.0.0

krwq commented 2 years ago

Changing area since all XML is doing here is simple string concat. PerfView shows this regression as coming somewhere from coreclr

krwq commented 2 years ago

@adamsitnik showed me how to get more info. The regressed method shows to me as coreclr!FramedAllocateString (net6.0 vs net7 preview 7 especially)

net6.0:

net7.0 P7:

JulieLeeMSFT commented 2 years ago

@AndyAyersMS PTAL. Cc @dotnet/jit-contrib @mangod9.

EgorBo commented 2 years ago

Regression in FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

JulieLeeMSFT commented 2 years ago

FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

cc @Maoni0.

AndyAyersMS commented 2 years ago

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

AndyAyersMS commented 2 years ago

Also we haven't historically run triage on full PGO results; I'm a bit surprised we even opened this issue. If we look at perf of this test with the default settings we see

(default) newplot - 2022-08-29T112657 971

For purposes of comparison:

(full pgo)

newplot - 2022-08-29T113211 040

Just for completeness here's the other bit of data we have:

(dynamic pgo) newplot - 2022-08-29T113445 526

The regressions in the PGO modes on Jan 31 (which inspired this issue) seem to well correlated with #64521. But clearly since then all modes have regressed.

The most troubling of these is the recent regression in default perf: newplot - 2022-08-29T114443 411

We have a reporting outage around the first and largest regression. Commit range there has 66 commits: https://github.com/dotnet/runtime/compare/bceefa9fe6e46e3d0cf7d52786ee0666adccd2e5...e117479e2530ed4e0e83e3e27948b5675fda2b57. I think @DrewScoggins is trying to backfill data for that gap.

Source of the second regression is unclear; implicated commit range is https://github.com/dotnet/runtime/compare/9814b1ba9b3ec14dd4c144a6baa1a345b6dcba0e...e3b2af382b7e754d7ee67db8fa7b3c196836774e which is 16 commits.

For PGO there have been several regressions over the past months. It will take some time to sort through them all.

AndyAyersMS commented 2 years ago

PGO regressions (best guesses)

5/30: #69878
6./6: #63095
6/27: pgo update #71128
7/13: #71991 ?
8/4-8/7: One or more of
- 73533,
- 73434,
- 73243,
- 67389,
- 73375

AndyAyersMS commented 2 years ago

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

@krwq any info you can add here? If not, will start looking at this more closely tomorrow.

krwq commented 2 years ago

@AndyAyersMS yes, they measured exactly same amount of work. Unfortunately I don't have more info I could add here. I only observed newer code being consistently slower on my machine which is inline with what the issue description is showing and seems the only thing the XML code is doing is string concatenation and given PerfView also showed string related stuff as being slower it seems root cause is somewhere in coreclr but I don't know that code enough to pinpoint anything in particular.

AndyAyersMS commented 2 years ago

Looking at default behavior for now (that is, product defaults, not Full PGO), I see 10% regressions vs .NET 6:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-LJOSDL : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2 Job-UKYGTY : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2 Job-HTVXBH : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method	Job	Runtime	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Allocated	Alloc Ratio
NonEmptyNameSpaceToString	Job-LJOSDL	.NET 5.0	net5.0	20.14 ns	0.253 ns	0.338 ns	20.04 ns	19.51 ns	20.77 ns	1.07	0.03	0.0127	80 B	1.00
NonEmptyNameSpaceToString	Job-UKYGTY	.NET 6.0	net6.0	18.92 ns	0.336 ns	0.448 ns	18.81 ns	18.34 ns	20.02 ns	1.00	0.00	0.0127	80 B	1.00
NonEmptyNameSpaceToString	Job-HTVXBH	.NET 7.0	net7.0	21.05 ns	0.240 ns	0.320 ns	21.01 ns	20.74 ns	22.34 ns	1.11	0.03	0.0127	80 B	1.00

Comparing filtered BDN profiles hasn't been illuminating yet, though they are similar to the ones from @krwq above in that the native part of the allocation seems slower.

Per discussion in #74771 the .NET 7 regression seems greatly reduced if we don't use region-based GC:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-PYJGQF : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0 Toolchain=net7.0 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.93 ns	0.470 ns	0.627 ns	19.84 ns	19.10 ns	22.15 ns	0.0127	80 B

and a similar with DOTNET_GCWriteBarrier=3

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-HBMGZU : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCWriteBarrier=3 PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0 Toolchain=net7.0 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	20.04 ns	0.630 ns	0.842 ns	19.96 ns	18.94 ns	23.21 ns	0.0127	80 B

AndyAyersMS commented 2 years ago

Moving this to area-gc since it seems to be related to recent gc changes.

Again, this is mainly about the non-PGO behavior.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details

### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```

### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	AndyAyersMS
Labels:	`blocking-release`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `area-GC-coreclr`, `runtime-coreclr`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone:	7.0.0

JulieLeeMSFT commented 2 years ago

@AndyAyersMS, are you still looking into this? Should we assign a different owner in GC team?

mrsharm commented 2 years ago

I'll take over from here since this is prospectively a GC issue. Investigation in Progress.

mrsharm commented 2 years ago

Data

Without GC Normalization

There seems to be a regression within a 5% margin:

Benchmark Name	Baseline	Comparand	Baseline Mean Duration	Comparand Mean Duration	Δ Mean Duration	Δ% Mean Duration	Baseline number of iterations	Comparand number of iterations	Δ number of iterations	Δ% number of iterations	Baseline gc count	Comparand gc count	Δ gc count	Δ% gc count	Baseline median	Comparand median	Δ median	Δ% median	Baseline non induced gc count	Comparand non induced gc count	Δ non induced gc count	Δ% non induced gc count	Baseline total allocated (mb)	Comparand total allocated (mb)	Δ total allocated (mb)	Δ% total allocated (mb)	Baseline total pause time (msec)	Comparand total pause time (msec)	Δ total pause time (msec)	Δ% total pause time (msec)	Baseline gc pause time %	Comparand gc pause time %	Δ gc pause time %	Δ% gc pause time %	Baseline JIT_WriteBarrier	Comparand JIT_WriteBarrier	Δ JIT_WriteBarrier	Δ% JIT_WriteBarrier	Baseline JIT_ByRefWriteBarrier	Comparand JIT_ByRefWriteBarrier	Δ JIT_ByRefWriteBarrier	Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	writebarrier_3	19.37	19.85	0.48	2.48	18	15	-3	-16.67	2612	2497	-115	-4.4	19.27	19.62	0.35	1.8	2376	2257	-119	-5.01	25090.88	23737.18	-1353.7	-5.4	11.29	11.26	-0.03	-0.3	2.4	2.5	0.11	4.42	0	344	344	∞	0	0	0	NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	19.37	19.46	0.09	0.44	18	15	-3	-16.67	2612	2530	-82	-3.14	19.27	19.27	-0.01	-0.03	2376	2290	-86	-3.62	25090.88	24207.49	-883.39	-3.52	11.29	11.41	0.12	1.07	2.4	2.58	0.19	7.74	0	0	0	NaN	0	0	0	NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-FQYTMY : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.37 ns	0.438 ns	0.469 ns	19.27 ns	18.87 ns	20.32 ns	0.0076	80 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-HLWNHM : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.46 ns	0.352 ns	0.329 ns	19.27 ns	19.09 ns	19.97 ns	0.0076	80 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-QBQSZP : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method	Mean	Error	StdDev	Median	Min	Max	Gen 0	Allocated
NonEmptyNameSpaceToString	19.85 ns	0.429 ns	0.401 ns	19.62 ns	19.45 ns	20.66 ns	0.0076	80 B

With GC Normalization

Benchmark Name	Baseline	Comparand	Baseline Mean Duration	Comparand Mean Duration	Δ Mean Duration	Δ% Mean Duration	Baseline number of iterations	Comparand number of iterations	Δ number of iterations	Δ% number of iterations	Baseline gc count	Comparand gc count	Δ gc count	Δ% gc count	Baseline median	Comparand median	Δ median	Δ% median	Baseline non induced gc count	Comparand non induced gc count	Δ non induced gc count	Δ% non induced gc count	Baseline total allocated (mb)	Comparand total allocated (mb)	Δ total allocated (mb)	Δ% total allocated (mb)	Baseline total pause time (msec)	Comparand total pause time (msec)	Δ total pause time (msec)	Δ% total pause time (msec)	Baseline gc pause time %	Comparand gc pause time %	Δ gc pause time %	Δ% gc pause time %	Baseline JIT_WriteBarrier	Comparand JIT_WriteBarrier	Δ JIT_WriteBarrier	Δ% JIT_WriteBarrier	Baseline JIT_ByRefWriteBarrier	Comparand JIT_ByRefWriteBarrier	Δ JIT_ByRefWriteBarrier	Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	108.59	322.66	214.06	197.12	128	128	0	0	4	4	0	0	100	0	-100	-100	0	0	0	NaN	1.89	1.88	-0	-0.23	1.52	1.52	0.01	0.39	1.89	1.83	-0.06	-3.41	0	0	0	NaN	0	0	0	NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	writebarrier_3	108.59	178.52	69.92	64.39	128	128	0	0	4	4	0	0	100	50	-50	-50	0	0	0	NaN	1.89	1.88	-0.01	-0.34	1.52	1.52	0.01	0.41	1.89	2.02	0.13	7.11	0	0	0	NaN	0	0	0	NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-QRXOHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	108.6 ns	45.77 ns	153.7 ns	13.59 ns	100.0 ns	0.0 ns	1,000.0 ns	100.0 ns	100.0 ns	9,208,633.1	488 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-VEMLHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove PowerPlanMode=00000000-0000-0000-0000-000000000000 Force=False
InvocationCount=1 IterationCount=128 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	322.7 ns	431.3 ns	1,448.5 ns	128.0 ns	0.0 ns	0.0 ns	14,300.0 ns	0.0 ns	0.0 ns	3,099,273.6	488 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-LKBXKI : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method	Mean	Error	StdDev	StdErr	Median	Min	Max	Q1	Q3	Op/s	Allocated
NonEmptyNameSpaceToString	178.5 ns	155.5 ns	522.3 ns	46.17 ns	50.00 ns	0.0 ns	2,950.0 ns	50.00 ns	50.00 ns	5,601,750.5	488 B

mrsharm commented 2 years ago

We have been working through root-causing and fixing this issue and here are our conclusions:

We root caused this issue to the introduction of a higher number and size of decommits for regions (enabled in 7.0) as opposed to segments (default in 6.0).
We are working on a fix: https://github.com/dotnet/runtime/pull/73620 that improves the numbers for these benchmarks, however, even with this fix we are not at the same level as segments in terms of mean execution time, number and size of virtual commits and decommits as before and this is something we plan to further address.
We think the next steps here are to merge the fix that improves the regression and work on further improvements in 8.0. More specifically, we should merge the fix mentioned above and move the issue to 8.0 (cc: @mangod9)
We found that running the benchmarks above using default parameters results in an inequitable for our GC evaluation comparison purposes where:
- There are an unequal number of invocations per iteration resulting in a lob sided effect on the mean. Fixed by passing in a derived invocation count by doing a dry run and making use of that invocation count for all our runs.
- Outliers are removed. Fixed by not removing outliers by setting the following in the command line args to run the microbenchmarks --outliers DontRemove.
- There are a variable number of induced GCs resulting from the unequal number of invocations and evaluation overhead runs. Fixed by removing the evaluation overhead stage.
Using a more equitable comparison repeated a number of times, we consistently observed the following:
- The regressions for the tests for regions were never above 10%.
- More consistent results that were less volatile.

Results

Benchmark	Baseline	Comparand	ΔMean	ΔMean %
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	1.56	7.62
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	decommit_fix	0.97	4.75

Details

Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-OERVVK : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

The baseline run is based on Peter’s fix rebased on the latest main from yesterday and using the segments’ CLR GC.
The regions run is based on the latest main from yesterday.
The decommit_fix is based on Peter’s fix rebased on the latest main.
The commit hash for the main used: 82d76a4bea7fabe8b1acfd9d3287c3a15d86b4a8

Full comparison table with the virtual commit and decommit data: Compare.xlsx

Tracking these issues here: https://github.com/dotnet/runtime/issues/73592

CC: @dotnet/gc

mangod9 commented 2 years ago

Thanks @mrsharm. @adamsitnik checking that you agree with the analysis of ensuring the GC impact is equitable across runs and with that the regression is not too high?

AndyAyersMS commented 2 years ago

I have seen this effect when comparing things too (say OSR on vs off) -- if the two variants behave differently BDN may change its measurement approach to the point where the measurement strategy becomes a contributing factor in the overall reported results.

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

mangod9 commented 2 years ago

is this the .NET equivalent of the uncertainty principle :)

mrsharm commented 2 years ago

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

Definitely, that's why we kept an eye on standard error and standard error normalized by the mean as well to ensure the repeatability of these experiments:

Benchmark Name	Baseline	Comparand	Δ standard error	Δ standard error / mean
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	regions	-0.2263	-0.0114
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString	segments	decommit_fix	-0.2705	-0.0133

Getting to this low standard error was a matter of discerning a high enough invocation count described above. The result was a consistent and equitable comparison i.e., we ran this experiment for 20+ times and got similar results.

mangod9 commented 2 years ago

Assume its ok to close this now since the normalized regression after the decommit fix is reasonable?

dotnet / runtime