dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.82k stars 4.61k forks source link

[Perf] Windows/x64: 2 Regressions on 3/7/2024 9:14:14 PM #99616

Closed performanceautofiler[bot] closed 6 days ago

performanceautofiler[bot] commented 5 months ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.22621
Queue OwlWindows
Baseline 4e86b1c63d9c41c6bfb6f42710be907199ce2671
Compare da781b3aab1bc30793812bced4a6b64d2df31a9f
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
111.56 ns 184.32 ns 1.65 0.24 False
240.67 ns 336.05 ns 1.40 0.26 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark*'
### Payloads [Baseline]() [Compare]() ### Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.CreateInstance_3 #### ETL Files #### Histogram #### JIT Disasms ### Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.CreateInstance_5 #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
DrewScoggins commented 5 months ago

Could be related to https://github.com/dotnet/runtime/pull/99367

DrewScoggins commented 5 months ago

Windows Intel Regressions: https://github.com/dotnet/perf-autofiling-issues/issues/31041 Linux Intel Regressions: https://github.com/dotnet/perf-autofiling-issues/issues/30845

AndyAyersMS commented 3 weeks ago

The linked issues show regressions in SeekUnroll, however these were subsequently fixed

image

The regressions in CreateInstance_{3,5} have persisted.

AndyAyersMS commented 1 week ago

For Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.CreateInstance_3 this looks like an across-the board regression (presumably the other test is similar):

image

Time to dig in...

AndyAyersMS commented 1 week ago

I don't see this locally (nor with 8 vs 9p7), going to try a different machine.

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK 9.0.100-preview.7.24406.3 [Host] : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2 Job-TZCRMD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-EXDDQR : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-TZCRMD \base-rel\corerun.exe 159.8 ns 8.11 ns 9.01 ns 156.6 ns 151.0 ns 180.8 ns 1.00 0.08 0.0290 184 B 1.00
CreateInstance_3 Job-EXDDQR \diff-rel\corerun.exe 157.2 ns 3.52 ns 3.77 ns 157.4 ns 150.8 ns 164.9 ns 0.99 0.06 0.0292 184 B 1.00
AndyAyersMS commented 1 week ago

No luck on my dev box either

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) (Hyper-V) Intel Xeon Platinum 8370C CPU 2.80GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.5.24307.3 [Host] : .NET 9.0.0 (9.0.24.30607), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI Job-LUKXCD : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI Job-DGVVDB : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-LUKXCD \base-rel\corerun.exe 213.5 ns 0.56 ns 0.47 ns 213.4 ns 213.0 ns 214.6 ns 1.00 0.0069 184 B 1.00
CreateInstance_3 Job-DGVVDB \diff-rel\corerun.exe 206.6 ns 0.60 ns 0.53 ns 206.6 ns 205.9 ns 207.5 ns 0.97 0.0066 184 B 1.00
AndyAyersMS commented 1 week ago

Maybe a small slowdown on Zen3

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) AMD Ryzen 7 5800H with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 6.0.20 (6.0.2023.32017), X64 RyuJIT AVX2 Job-APBQKL : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-MEZSNA : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-APBQKL \base-rel\corerun.exe 113.9 ns 0.95 ns 0.85 ns 113.7 ns 112.9 ns 115.8 ns 1.00 0.01 0.0217 184 B 1.00
CreateInstance_3 Job-MEZSNA \diff-rel\corerun.exe 118.4 ns 3.28 ns 3.78 ns 115.9 ns 114.3 ns 124.1 ns 1.04 0.03 0.0215 184 B 1.00
AndyAyersMS commented 1 week ago

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) Intel Core i9-9900T CPU 2.10GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2 Job-OMZCPY : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-AGXDPH : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-OMZCPY \base-rel\corerun.exe 215.9 ns 13.58 ns 15.64 ns 208.7 ns 200.0 ns 257.6 ns 1.00 0.10 0.0219 184 B 1.00
CreateInstance_3 Job-AGXDPH \diff-rel\corerun.exe 191.9 ns 6.41 ns 7.39 ns 190.2 ns 170.0 ns 203.2 ns 0.89 0.07 0.0218 184 B 1.00
AndyAyersMS commented 1 week ago

So despite the lab seeing consistent regressions, I can't repro it on any of my local boxes. Also not seeing any 8 vs 9 regressions.

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) Intel Core i9-9900T CPU 2.10GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2 Job-WTOKQI : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2 Job-OXIPDW : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Runtime Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateInstance_3 .NET 8.0 179.6 ns 3.08 ns 2.73 ns 178.7 ns 175.4 ns 186.5 ns 1.00 0.02 0.0214 184 B 1.00
CreateInstance_3 .NET 9.0 135.9 ns 2.78 ns 2.85 ns 135.3 ns 132.0 ns 140.7 ns 0.76 0.02 0.0044 40 B 0.22

@LoopedBard3 any chance I could hop onto one of the Owl machines and do some runs there?

LoopedBard3 commented 1 week ago

@LoopedBard3 any chance I could hop onto one of the Owl machines and do some runs there?

Reached out offline 👍

AndyAyersMS commented 1 week ago

Going to push this out of 9.0 ... still need to dig in.

AndyAyersMS commented 6 days ago

Running on one of the OWL machines

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) AMD Ryzen 7 PRO 3700, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2 Job-FTYICS : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-CUGHKW : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-FTYICS \base-rel\corerun.exe 192.6 ns 2.72 ns 2.55 ns 192.6 ns 187.0 ns 195.8 ns 1.00 0.02 0.0214 184 B 1.00
CreateInstance_3 Job-CUGHKW \diff-rel\corerun.exe 203.2 ns 1.61 ns 1.34 ns 203.8 ns 199.5 ns 204.3 ns 1.06 0.02 0.0220 184 B 1.00

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 11 (10.0.22631.3880/23H2/2023Update/SunValley3) AMD Ryzen 7 PRO 3700, 1 CPU, 16 logical and 8 physical cores .NET SDK 9.0.100-preview.7.24407.12 [Host] : .NET 9.0.0 (9.0.24.40507), X64 RyuJIT AVX2 Job-VPCTBQ : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2 Job-UTFSCM : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250ms MaxIterationCount=20 MinIterationCount=15 WarmupCount=1

Method Job Toolchain Mean Error StdDev Median Min Max Ratio Gen0 Allocated Alloc Ratio
CreateInstance_3 Job-VPCTBQ \base-rel\corerun.exe 194.8 ns 1.22 ns 1.14 ns 194.6 ns 193.4 ns 197.1 ns 1.00 0.0213 184 B 1.00
CreateInstance_3 Job-UTFSCM \diff-rel\corerun.exe 192.8 ns 0.97 ns 0.86 ns 193.1 ns 190.8 ns 194.3 ns 0.99 0.0212 184 B 1.00
AndyAyersMS commented 6 days ago

Ok, think I figured this one out.

dotnet/performance#4033 modified these benchmarks, and this happened between the base and the diff runs. So running as I've been doing (where the performance repo is fixed) shows no regression.

Closing as expected (given that the benchmark changed).