dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.18k stars 4.72k forks source link

Regressions in System.Xml.Linq.Perf_XName (FullPGO) #64626

Closed performanceautofiler[bot] closed 2 years ago

performanceautofiler[bot] commented 2 years ago

Run Information

Architecture x64
OS Windows 10.0.18362
Baseline 215c328e039b3a6b6bfdc6514253307105f25816
Compare ca6716a658137b55d29b3ce100d5d14eb0edd67e
Diff Diff

Regressions in System.Xml.Linq.Perf_XName

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio Baseline ETL Compare ETL
[NonEmptyNameSpaceToString - Duration of single invocation](<https://pvscmdupload.blob.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362_PGOType=fullpgo/System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString.html>) 19.80 ns 24.26 ns 1.22 0.18 False

graph Test Report

Repro

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*'
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-security, @vcsjones See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `area-System.Security`, `untriaged`
Milestone: -
ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-xml See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `area-System.Xml`, `tenet-performance`, `tenet-performance-benchmarks`, `untriaged`, `refs/heads/main`, `x64`, `Regression`, `RunKind=micro`, `CoreClr`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone: -
krwq commented 2 years ago

@adamsitnik this seems a bit like a noise - looking at graph my gut tells me this is well within stdev, any clues?

adamsitnik commented 2 years ago

@krwq that was my initial impression as well, but then I've clicked on "Toggle annotations" (1), the blue line (2) and moving average (3) and it's rather an actual regression:

image

eiriktsarpalis commented 2 years ago

Performance for the particular benchmarks appears to have regressed substantially over the recent months:

newplot

I don't believe we've made any change to Xml over that time, codegen related perhaps?

krwq commented 2 years ago

My initial perf results also confirm that we got slightly slower each time: net6.0 -> net7.0 preview 1 -> net7.0 preview 7.

The PerfView showes that that mostly string.Concat and coreclr contribute to that test. Looking at the results of all 3 I'm seeing we spend less time in string.Concat over time but more time in coreclr.

PerfView doesn't show me exactly where in coreclr we're spending more time (or I'm not sure how to find that out). I'll check with @adamsitnik to see if he has more clues how to get more grained results to figure out where to redirect this issue.

This is the app I've been measuring on:

using System;
using System.Diagnostics;
using System.Xml.Linq;

public class Program
{
    public static void Main()
    {
        XName hasNamespace = XName.Get("{http://www.example.test}Root");

        Stopwatch sw = Stopwatch.StartNew();
        const int n = 500000000;
        for (int i = 0; i < n; i++)
        {
            hasNamespace.ToString();
        }
        sw.Stop();

        Console.WriteLine($"{sw.ElapsedMilliseconds} ms");
    }
}
ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: krwq
Labels: `area-System.Xml`, `blocking-release`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `runtime-coreclr`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone: 7.0.0
krwq commented 2 years ago

Changing area since all XML is doing here is simple string concat. PerfView shows this regression as coming somewhere from coreclr

krwq commented 2 years ago

@adamsitnik showed me how to get more info. The regressed method shows to me as coreclr!FramedAllocateString (net6.0 vs net7 preview 7 especially)

net6.0: image

net7.0 P7: image

JulieLeeMSFT commented 2 years ago

@AndyAyersMS PTAL. Cc @dotnet/jit-contrib @mangod9.

EgorBo commented 2 years ago

Regression in FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

JulieLeeMSFT commented 2 years ago

FramedAllocateString might be a sign of a regression in GC e.g. GC Regions.

cc @Maoni0.

AndyAyersMS commented 2 years ago

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

AndyAyersMS commented 2 years ago

Also we haven't historically run triage on full PGO results; I'm a bit surprised we even opened this issue. If we look at perf of this test with the default settings we see

(default) newplot - 2022-08-29T112657 971

For purposes of comparison:

(full pgo)

newplot - 2022-08-29T113211 040

Just for completeness here's the other bit of data we have:

(dynamic pgo) newplot - 2022-08-29T113445 526

The regressions in the PGO modes on Jan 31 (which inspired this issue) seem to well correlated with #64521. But clearly since then all modes have regressed.

The most troubling of these is the recent regression in default perf: newplot - 2022-08-29T114443 411

We have a reporting outage around the first and largest regression. Commit range there has 66 commits: https://github.com/dotnet/runtime/compare/bceefa9fe6e46e3d0cf7d52786ee0666adccd2e5...e117479e2530ed4e0e83e3e27948b5675fda2b57. I think @DrewScoggins is trying to backfill data for that gap.

Source of the second regression is unclear; implicated commit range is https://github.com/dotnet/runtime/compare/9814b1ba9b3ec14dd4c144a6baa1a345b6dcba0e...e3b2af382b7e754d7ee67db8fa7b3c196836774e which is 16 commits.

For PGO there have been several regressions over the past months. It will take some time to sort through them all.

AndyAyersMS commented 2 years ago

PGO regressions (best guesses)

AndyAyersMS commented 2 years ago

@krwq in the profiles above are the two runs measuring the same amount of work (that is, same values for iterationCount and invocationCount ?

And, were you filtering the profile to just the actual stages?

Assuming so, it does not look like a codegen issue, the jit-generated code is actually a good deal faster.

@krwq any info you can add here? If not, will start looking at this more closely tomorrow.

krwq commented 2 years ago

@AndyAyersMS yes, they measured exactly same amount of work. Unfortunately I don't have more info I could add here. I only observed newer code being consistently slower on my machine which is inline with what the issue description is showing and seems the only thing the XML code is doing is string concatenation and given PerfView also showed string related stuff as being slower it seems root cause is somewhere in coreclr but I don't know that code enough to pinpoint anything in particular.

AndyAyersMS commented 2 years ago

Looking at default behavior for now (that is, product defaults, not Full PGO), I see 10% regressions vs .NET 6:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-LJOSDL : .NET 5.0.17 (5.0.1722.21314), X64 RyuJIT AVX2 Job-UKYGTY : .NET 6.0.8 (6.0.822.36306), X64 RyuJIT AVX2 Job-HTVXBH : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method Job Runtime Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Allocated Alloc Ratio
NonEmptyNameSpaceToString Job-LJOSDL .NET 5.0 net5.0 20.14 ns 0.253 ns 0.338 ns 20.04 ns 19.51 ns 20.77 ns 1.07 0.03 0.0127 80 B 1.00
NonEmptyNameSpaceToString Job-UKYGTY .NET 6.0 net6.0 18.92 ns 0.336 ns 0.448 ns 18.81 ns 18.34 ns 20.02 ns 1.00 0.00 0.0127 80 B 1.00
NonEmptyNameSpaceToString Job-HTVXBH .NET 7.0 net7.0 21.05 ns 0.240 ns 0.320 ns 21.01 ns 20.74 ns 22.34 ns 1.11 0.03 0.0127 80 B 1.00

Comparing filtered BDN profiles hasn't been illuminating yet, though they are similar to the ones from @krwq above in that the native part of the allocation seems slower.

Per discussion in #74771 the .NET 7 regression seems greatly reduced if we don't use region-based GC:

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-PYJGQF : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0 Toolchain=net7.0 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.93 ns 0.470 ns 0.627 ns 19.84 ns 19.10 ns 22.15 ns 0.0127 80 B

and a similar with DOTNET_GCWriteBarrier=3

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22425.9 [Host] : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2 Job-HBMGZU : .NET 7.0.0 (7.0.22.42316), X64 RyuJIT AVX2

EnvironmentVariables=DOTNET_GCWriteBarrier=3 PowerPlanMode=00000000-0000-0000-0000-000000000000 Runtime=.NET 7.0 Toolchain=net7.0 InvocationCount=9983200 IterationCount=25 IterationTime=250.0000 ms MaxIterationCount=20 MemoryRandomization=True MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 20.04 ns 0.630 ns 0.842 ns 19.96 ns 18.94 ns 23.21 ns 0.0127 80 B
AndyAyersMS commented 2 years ago

Moving this to area-gc since it seems to be related to recent gc changes.

Again, this is mainly about the non-PGO behavior.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/gc See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Architecture | x64 -- | -- OS | Windows 10.0.18362 Baseline | [215c328e039b3a6b6bfdc6514253307105f25816](https://github.com/dotnet/runtime/commit/215c328e039b3a6b6bfdc6514253307105f25816) Compare | [ca6716a658137b55d29b3ce100d5d14eb0edd67e](https://github.com/dotnet/runtime/commit/ca6716a658137b55d29b3ce100d5d14eb0edd67e) Diff | [Diff](https://github.com/dotnet/runtime/compare/215c328e039b3a6b6bfdc6514253307105f25816...ca6716a658137b55d29b3ce100d5d14eb0edd67e) ### Regressions in System.Xml.Linq.Perf_XName Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- [NonEmptyNameSpaceToString - Duration of single invocation]() | 19.80 ns | 24.26 ns | 1.22 | 0.18 | False | | | ![graph]() [Test Report]() ### Repro ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net6.0 --filter 'System.Xml.Linq.Perf_XName*' ```
### Payloads [Baseline]() [Compare]() ### Histogram #### System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString ```log ``` ### Description of detection logic ```IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsRegressionBase: Marked as regression because the compare was 5% greater than the baseline, and the value was not too small. IsRegressionChecked: Marked as regression because the three check build points were 0.05 greater than the baseline. IsRegressionWindowed: Marked as regression because 24.260317557881482 > 21.467619909142417. IsChangePoint: Marked as a change because one of 1/31/2022 1:37:35 PM, 2/1/2022 5:51:01 AM falls between 1/23/2022 2:39:24 PM and 2/1/2022 5:51:01 AM. IsRegressionStdDev: Marked as regression because -27.429216259968307 (T) = (0 -23.97864052999163) / Math.Sqrt((0.403977404141772 / (75)) + (0.059125059664990244 / (4))) is less than -1.991254395387924 = MathNet.Numerics.Distributions.StudentT.InvCDF(0, 1, (75) + (4) - 2, .025) and -0.19395673620667156 = (20.08334121567448 - 23.97864052999163) / 20.08334121567448 is less than -0.05. IsImprovementBase: Marked as not an improvement because the compare was not 5% less than the baseline, or the value was too small. IsChangeEdgeDetector: Marked not as a regression because Edge Detector said so. ``` ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: AndyAyersMS
Labels: `blocking-release`, `tenet-performance`, `tenet-performance-benchmarks`, `arch-x64`, `area-CodeGen-coreclr`, `area-GC-coreclr`, `runtime-coreclr`, `refs/heads/main`, `RunKind=micro`, `Windows 10.0.18362`, `PGOType=fullpgo`
Milestone: 7.0.0
JulieLeeMSFT commented 2 years ago

@AndyAyersMS, are you still looking into this? Should we assign a different owner in GC team?

mrsharm commented 2 years ago

I'll take over from here since this is prospectively a GC issue. Investigation in Progress.

mrsharm commented 2 years ago

Data

Without GC Normalization

There seems to be a regression within a 5% margin:

Benchmark Name Baseline Comparand Baseline Mean Duration Comparand Mean Duration Δ Mean Duration Δ% Mean Duration Baseline number of iterations Comparand number of iterations Δ number of iterations Δ% number of iterations Baseline gc count Comparand gc count Δ gc count Δ% gc count Baseline median Comparand median Δ median Δ% median Baseline non induced gc count Comparand non induced gc count Δ non induced gc count Δ% non induced gc count Baseline total allocated (mb) Comparand total allocated (mb) Δ total allocated (mb) Δ% total allocated (mb) Baseline total pause time (msec) Comparand total pause time (msec) Δ total pause time (msec) Δ% total pause time (msec) Baseline gc pause time % Comparand gc pause time % Δ gc pause time % Δ% gc pause time % Baseline JIT_WriteBarrier Comparand JIT_WriteBarrier Δ JIT_WriteBarrier Δ% JIT_WriteBarrier Baseline JIT_ByRefWriteBarrier Comparand JIT_ByRefWriteBarrier Δ JIT_ByRefWriteBarrier Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments writebarrier_3 19.37 19.85 0.48 2.48 18 15 -3 -16.67 2612 2497 -115 -4.4 19.27 19.62 0.35 1.8 2376 2257 -119 -5.01 25090.88 23737.18 -1353.7 -5.4 11.29 11.26 -0.03 -0.3 2.4 2.5 0.11 4.42 0 344 344 0 0 0 NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 19.37 19.46 0.09 0.44 18 15 -3 -16.67 2612 2530 -82 -3.14 19.27 19.27 -0.01 -0.03 2376 2290 -86 -3.62 25090.88 24207.49 -883.39 -3.52 11.29 11.41 0.12 1.07 2.4 2.58 0.19 7.74 0 0 0 NaN 0 0 0 NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-FQYTMY : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.37 ns 0.438 ns 0.469 ns 19.27 ns 18.87 ns 20.32 ns 0.0076 80 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-HLWNHM : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.46 ns 0.352 ns 0.329 ns 19.27 ns 19.09 ns 19.97 ns 0.0076 80 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-QBQSZP : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Mean Error StdDev Median Min Max Gen 0 Allocated
NonEmptyNameSpaceToString 19.85 ns 0.429 ns 0.401 ns 19.62 ns 19.45 ns 20.66 ns 0.0076 80 B

With GC Normalization

Benchmark Name Baseline Comparand Baseline Mean Duration Comparand Mean Duration Δ Mean Duration Δ% Mean Duration Baseline number of iterations Comparand number of iterations Δ number of iterations Δ% number of iterations Baseline gc count Comparand gc count Δ gc count Δ% gc count Baseline median Comparand median Δ median Δ% median Baseline non induced gc count Comparand non induced gc count Δ non induced gc count Δ% non induced gc count Baseline total allocated (mb) Comparand total allocated (mb) Δ total allocated (mb) Δ% total allocated (mb) Baseline total pause time (msec) Comparand total pause time (msec) Δ total pause time (msec) Δ% total pause time (msec) Baseline gc pause time % Comparand gc pause time % Δ gc pause time % Δ% gc pause time % Baseline JIT_WriteBarrier Comparand JIT_WriteBarrier Δ JIT_WriteBarrier Δ% JIT_WriteBarrier Baseline JIT_ByRefWriteBarrier Comparand JIT_ByRefWriteBarrier Δ JIT_ByRefWriteBarrier Δ% JIT_ByRefWriteBarrier
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 108.59 322.66 214.06 197.12 128 128 0 0 4 4 0 0 100 0 -100 -100 0 0 0 NaN 1.89 1.88 -0 -0.23 1.52 1.52 0.01 0.39 1.89 1.83 -0.06 -3.41 0 0 0 NaN 0 0 0 NaN
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments writebarrier_3 108.59 178.52 69.92 64.39 128 128 0 0 4 4 0 0 100 50 -50 -50 0 0 0 NaN 1.89 1.88 -0.01 -0.34 1.52 1.52 0.01 0.41 1.89 2.02 0.13 7.11 0 0 0 NaN 0 0 0 NaN

Baseline - Segments

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-QRXOHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCName=clrgc.dll PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 108.6 ns 45.77 ns 153.7 ns 13.59 ns 100.0 ns 0.0 ns 1,000.0 ns 100.0 ns 100.0 ns 9,208,633.1 488 B

Regions

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-VEMLHO : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove PowerPlanMode=00000000-0000-0000-0000-000000000000 Force=False
InvocationCount=1 IterationCount=128 IterationTime=250.0000 ms
MaxIterationCount=20 MinIterationCount=15 UnrollFactor=1
WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 322.7 ns 431.3 ns 1,448.5 ns 128.0 ns 0.0 ns 0.0 ns 14,300.0 ns 0.0 ns 0.0 ns 3,099,273.6 488 B

Regions w/ More Precise Writebarrier

BenchmarkDotNet=v0.13.1.1847-nightly, OS=Windows 11 (10.0.22000.856/21H2) Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=7.0.100-preview.7.22377.5 [Host] : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2 Job-LKBXKI : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2

OutlierMode=DontRemove EnvironmentVariables=COMPlus_GCWriteBarrier=3,COMPlus_EnableWriteXorExecute=0 PowerPlanMode=00000000-0000-0000-0000-000000000000
Force=False InvocationCount=1 IterationCount=128
IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15
UnrollFactor=1 WarmupCount=1

Method Mean Error StdDev StdErr Median Min Max Q1 Q3 Op/s Allocated
NonEmptyNameSpaceToString 178.5 ns 155.5 ns 522.3 ns 46.17 ns 50.00 ns 0.0 ns 2,950.0 ns 50.00 ns 50.00 ns 5,601,750.5 488 B
mrsharm commented 2 years ago

We have been working through root-causing and fixing this issue and here are our conclusions:

  1. We root caused this issue to the introduction of a higher number and size of decommits for regions (enabled in 7.0) as opposed to segments (default in 6.0).
  2. We are working on a fix: https://github.com/dotnet/runtime/pull/73620 that improves the numbers for these benchmarks, however, even with this fix we are not at the same level as segments in terms of mean execution time, number and size of virtual commits and decommits as before and this is something we plan to further address.
  3. We think the next steps here are to merge the fix that improves the regression and work on further improvements in 8.0. More specifically, we should merge the fix mentioned above and move the issue to 8.0 (cc: @mangod9)
  4. We found that running the benchmarks above using default parameters results in an inequitable for our GC evaluation comparison purposes where:
    • There are an unequal number of invocations per iteration resulting in a lob sided effect on the mean. Fixed by passing in a derived invocation count by doing a dry run and making use of that invocation count for all our runs.
    • Outliers are removed. Fixed by not removing outliers by setting the following in the command line args to run the microbenchmarks --outliers DontRemove.
    • There are a variable number of induced GCs resulting from the unequal number of invocations and evaluation overhead runs. Fixed by removing the evaluation overhead stage.
  5. Using a more equitable comparison repeated a number of times, we consistently observed the following:
    • The regressions for the tests for regions were never above 10%.
    • More consistent results that were less volatile.

Results

Benchmark Baseline Comparand ΔMean ΔMean %
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions 1.56 7.62
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments decommit_fix 0.97 4.75

Details

Intel Core i9-10900K CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores
.NET SDK=7.0.100-preview.7.22377.5
  [Host]     : .NET 7.0.0 (7.0.22.37506), X64 RyuJIT AVX2
  Job-OERVVK : .NET 8.0.0 (42.42.42.42424), X64 RyuJIT AVX2

Full comparison table with the virtual commit and decommit data: Compare.xlsx

Tracking these issues here: https://github.com/dotnet/runtime/issues/73592

CC: @dotnet/gc

mangod9 commented 2 years ago

Thanks @mrsharm. @adamsitnik checking that you agree with the analysis of ensuring the GC impact is equitable across runs and with that the regression is not too high?

AndyAyersMS commented 2 years ago

I have seen this effect when comparing things too (say OSR on vs off) -- if the two variants behave differently BDN may change its measurement approach to the point where the measurement strategy becomes a contributing factor in the overall reported results.

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

mangod9 commented 2 years ago

is this the .NET equivalent of the uncertainty principle :)

mrsharm commented 2 years ago

Keeping things the "same" across the two sets of measurements often resolves this, though not always.

Definitely, that's why we kept an eye on standard error and standard error normalized by the mean as well to ensure the repeatability of these experiments:

Benchmark Name Baseline Comparand Δ standard error Δ standard error / mean
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments regions -0.2263 -0.0114
System.Xml.Linq.Perf_XName.NonEmptyNameSpaceToString segments decommit_fix -0.2705 -0.0133

Getting to this low standard error was a matter of discerning a high enough invocation count described above. The result was a consistent and equitable comparison i.e., we ran this experiment for 20+ times and got similar results.

mangod9 commented 2 years ago

Assume its ok to close this now since the normalized regression after the decommit fix is reasonable?