dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

[Perf] Windows/x64: 4 Regressions in System.Collections.CreateAddAndRemove<String> #109734

Open performanceautofiler[bot] opened 1 week ago

performanceautofiler[bot] commented 1 week ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Numerics.Tests.Perf_VectorConvert

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
469.32 ns 546.50 ns 1.16 0.04 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Numerics.Tests.Perf_VectorConvert*'
### System.Numerics.Tests.Perf_VectorConvert.Convert_float_int #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
5.45 ns 7.14 ns 1.31 0.10 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'
### System.Tests.Perf_Char.Char_ToLowerInvariant(input: "Hello World!") #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
12.56 ns 15.31 ns 1.22 0.14 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'
### System.Memory.Span<Int32>.BinarySearch(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.22631
Queue ViperWindows
Baseline 004d59ade00c9cdf929bc520ef6a950eb851578f
Compare 302e0d4cf9d603fbc76e508b0b41e778c69f2186
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.CreateAddAndRemove<String>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
3.15 Ξs 3.37 Ξs 1.07 0.02 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.CreateAddAndRemove&lt;String&gt;*'
### System.Collections.CreateAddAndRemove<String>.Queue(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
LoopedBard3 commented 1 week ago

Primarily focused on the System.Collections.CreateAddAndRemove Regression. Others look like noise. Likely due to https://github.com/dotnet/runtime/pull/109258. FYI @saucecontrol and @AndyAyersMS.

Commit range: https://github.com/dotnet/runtime/compare/1c10ceecbf5356c33c67f6325072d753707f854e...30dabfd706693d231dbdf9b13a1ab93b510513e7

dotnet-policy-service[bot] commented 1 week ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

saucecontrol commented 1 week ago

There are no codegen changes for that benchmark -- it's noisy in general. Testing the ends of the commit range, I see anything from +2% to -15%.

Results are bimodal a lot of the time. ex:

CreateAddAndRemove<String>.Queue: Job-MQUEJG(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_408caa4e\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1) [Size=512]
Runtime = .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI; GC = Concurrent Workstation
Mean = 1.816 us, StdErr = 0.020 us (1.12%), N = 20, StdDev = 0.091 us
Min = 1.654 us, Q1 = 1.750 us, Median = 1.793 us, Q3 = 1.882 us, Max = 1.962 us
IQR = 0.132 us, LowerFence = 1.552 us, UpperFence = 2.080 us
ConfidenceInterval = [1.737 us; 1.895 us] (CI 99.9%), Margin = 0.079 us (4.33% of Mean)
Skewness = 0.13, Kurtosis = 1.71, MValue = 3.2
-------------------- Histogram --------------------
[1.610 us ; 1.708 us) | @
[1.708 us ; 1.796 us) | @@@@@@@@@@
[1.796 us ; 1.867 us) | @
[1.867 us ; 1.955 us) | @@@@@@@
[1.955 us ; 2.006 us) | @
---------------------------------------------------

CreateAddAndRemove<String>.Queue: Job-XSVGKI(PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_f7334fab\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1) [Size=512]
Runtime = .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI; GC = Concurrent Workstation
Mean = 1.578 us, StdErr = 0.004 us (0.25%), N = 13, StdDev = 0.014 us
Min = 1.563 us, Q1 = 1.569 us, Median = 1.573 us, Q3 = 1.580 us, Max = 1.613 us
IQR = 0.012 us, LowerFence = 1.551 us, UpperFence = 1.598 us
ConfidenceInterval = [1.561 us; 1.595 us] (CI 99.9%), Margin = 0.017 us (1.10% of Mean)
Skewness = 1.25, Kurtosis = 3.33, MValue = 2
-------------------- Histogram --------------------
[1.558 us ; 1.621 us) | @@@@@@@@@@@@@
---------------------------------------------------

// * Summary *

BenchmarkDotNet v0.14.1-nightly.20240924.187, Windows 11 (10.0.26100.2314)
Unknown processor
.NET SDK 9.0.100
  [Host]     : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-MQUEJG : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Job-XSVGKI : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-MQUEJG | \core_root_408caa4e\CoreRun.exe | 512  | 1.816 us | 0.0787 us | 0.0906 us | 1.793 us | 1.654 us | 1.962 us |  1.00 |    0.07 |   2,753 B | 1.0021 |    8.2 KB |        1.00 |
| Queue  | Job-XSVGKI | \core_root_f7334fab\CoreRun.exe | 512  | 1.578 us | 0.0173 us | 0.0145 us | 1.573 us | 1.563 us | 1.613 us |  0.87 |    0.04 |   2,714 B | 0.9973 |    8.2 KB |        1.00 |

// * Warnings *
MultimodalDistribution
  CreateAddAndRemove<String>.Queue: PowerPlanMode=00000000-0000-0000-0000-000000000000, Toolchain=\core_root_408caa4e\CoreRun.exe, IterationTime=250ms, MaxIterationCount=20, MinIterationCount=15, WarmupCount=1 -> It seems that the distribution can have several modes (mValue = 3.2)
saucecontrol commented 1 week ago

Oops, I missed there are some small diffs, just nothing HWIntrinsics related.

https://www.diffchecker.com/cZokjbLf/

In any case, the best times across multiple runs are the same before and after.

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-WCNCZS | \core_root_408caa4e\CoreRun.exe | 512  | 1.589 us | 0.0304 us | 0.0325 us | 1.581 us | 1.559 us | 1.665 us |  1.00 |    0.03 |   2,895 B | 0.9974 |    8.2 KB |        1.00 |
| Queue  | Job-GDAQWM | \core_root_f7334fab\CoreRun.exe | 512  | 1.571 us | 0.0111 us | 0.0093 us | 1.569 us | 1.560 us | 1.595 us |  0.99 |    0.02 |   2,856 B | 1.0021 |    8.2 KB |        1.00 |

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-UVLVSQ | \core_root_408caa4e\CoreRun.exe | 512  | 1.568 us | 0.0153 us | 0.0128 us | 1.568 us | 1.553 us | 1.590 us |  1.00 |    0.01 |   2,753 B | 1.0021 |    8.2 KB |        1.00 |
| Queue  | Job-EMTGJG | \core_root_f7334fab\CoreRun.exe | 512  | 1.607 us | 0.0486 us | 0.0541 us | 1.598 us | 1.556 us | 1.768 us |  1.02 |    0.03 |   2,714 B | 1.0022 |    8.2 KB |        1.00 |

| Method | Job        | Toolchain                       | Size | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | RatioSD | Code Size | Gen0   | Allocated | Alloc Ratio |
|------- |----------- |-------------------------------- |----- |---------:|----------:|----------:|---------:|---------:|---------:|------:|--------:|----------:|-------:|----------:|------------:|
| Queue  | Job-BZHTXI | \core_root_408caa4e\CoreRun.exe | 512  | 1.864 us | 0.0469 us | 0.0540 us | 1.850 us | 1.745 us | 1.955 us |  1.00 |    0.04 |   2,753 B | 0.9973 |    8.2 KB |        1.00 |
| Queue  | Job-XXKNGE | \core_root_f7334fab\CoreRun.exe | 512  | 1.576 us | 0.0164 us | 0.0146 us | 1.571 us | 1.559 us | 1.605 us |  0.85 |    0.03 |   2,714 B | 1.0005 |    8.2 KB |        1.00 |
JulieLeeMSFT commented 1 week ago

@AndyAyersMS, PTAL.