dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

[Perf] Windows/x64: 5 Regressions on 2/3/2024 12:19:35 AM #98044

Open performanceautofiler[bot] opened 7 months ago

performanceautofiler[bot] commented 7 months ago

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare 2361c00717a54a5dd9b0cf727102d64f783855b9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringEquality

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
761.52 ns 973.83 ns 1.28 0.00 True
1.19 ฮผs 1.28 ฮผs 1.08 0.01 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringEquality*'
### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare 2361c00717a54a5dd9b0cf727102d64f783855b9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Benchmark.GetChildKeysTests

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
14.99 ms 16.20 ms 1.08 0.02 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchmark.GetChildKeysTests*'
### Payloads [Baseline]() [Compare]() ### Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare 2361c00717a54a5dd9b0cf727102d64f783855b9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
8.54 ฮผs 16.21 ฮผs 1.90 0.45 True

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'
### Payloads [Baseline]() [Compare]() ### Span.Sorting.QuickSortArray(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture x64
OS Windows 10.0.18362
Queue TigerWindows
Baseline 1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare 2361c00717a54a5dd9b0cf727102d64f783855b9
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.EightQueens

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.80 ฮผs 2.04 ฮผs 1.13 0.03 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.EightQueens*'
### Payloads [Baseline]() [Compare]() ### Benchstone.BenchI.EightQueens.Test #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
DrewScoggins commented 7 months ago

Diff here: https://github.com/dotnet/runtime/compare/207e1fb35050240cdc9ae40093d831414c982316...df0778dc9eb9f15c9270ba1a09d475253018e824

Nothing is jumping out as the culprit, but there were a few JIT changes.

DrewScoggins commented 7 months ago

Linux related regressions: https://github.com/dotnet/perf-autofiling-issues/issues/28564

ghost commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.18362 Queue | TigerWindows Baseline | [1a2f095fb212dcbf394f01122b9f317b7cc70fdb](https://github.com/dotnet/runtime/commit/1a2f095fb212dcbf394f01122b9f317b7cc70fdb) Compare | [2361c00717a54a5dd9b0cf727102d64f783855b9](https://github.com/dotnet/runtime/commit/2361c00717a54a5dd9b0cf727102d64f783855b9) Diff | [Diff](https://github.com/dotnet/runtime/compare/1a2f095fb212dcbf394f01122b9f317b7cc70fdb...2361c00717a54a5dd9b0cf727102d64f783855b9) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in System.Globalization.Tests.StringEquality Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [Compare_Same - Duration of single invocation]()
  • ๐Ÿ“ - [Benchmark Source]()
  • [๐Ÿ“ˆ - ADX Test Multi Config Graph]()
| 761.52 ns | 973.83 ns | 1.28 | 0.00 | True | | | |
  • [Compare_Same_Upper - Duration of single invocation]()
  • ๐Ÿ“ - [Benchmark Source]()
  • [๐Ÿ“ˆ - ADX Test Multi Config Graph]()
| 1.19 ฮผs | 1.28 ฮผs | 1.08 | 0.01 | False | | | ![graph]() ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringEquality*' ```
### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
--- ### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.18362 Queue | TigerWindows Baseline | [1a2f095fb212dcbf394f01122b9f317b7cc70fdb](https://github.com/dotnet/runtime/commit/1a2f095fb212dcbf394f01122b9f317b7cc70fdb) Compare | [2361c00717a54a5dd9b0cf727102d64f783855b9](https://github.com/dotnet/runtime/commit/2361c00717a54a5dd9b0cf727102d64f783855b9) Diff | [Diff](https://github.com/dotnet/runtime/compare/1a2f095fb212dcbf394f01122b9f317b7cc70fdb...2361c00717a54a5dd9b0cf727102d64f783855b9) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in Benchmark.GetChildKeysTests Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [AddChainedConfigurationEmpty - Duration of single invocation]()
  • ๐Ÿ“ - [Benchmark Source]()
  • [๐Ÿ“ˆ - ADX Test Multi Config Graph]()
| 14.99 ms | 16.20 ms | 1.08 | 0.02 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchmark.GetChildKeysTests*' ```
### Payloads [Baseline]() [Compare]() ### Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
--- ### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.18362 Queue | TigerWindows Baseline | [1a2f095fb212dcbf394f01122b9f317b7cc70fdb](https://github.com/dotnet/runtime/commit/1a2f095fb212dcbf394f01122b9f317b7cc70fdb) Compare | [2361c00717a54a5dd9b0cf727102d64f783855b9](https://github.com/dotnet/runtime/commit/2361c00717a54a5dd9b0cf727102d64f783855b9) Diff | [Diff](https://github.com/dotnet/runtime/compare/1a2f095fb212dcbf394f01122b9f317b7cc70fdb...2361c00717a54a5dd9b0cf727102d64f783855b9) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in Span.Sorting Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [QuickSortArray - Duration of single invocation]()
  • ๐Ÿ“ - [Benchmark Source]()
  • [๐Ÿ“ˆ - ADX Test Multi Config Graph]()
| 8.54 ฮผs | 16.21 ฮผs | 1.90 | 0.45 | True | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*' ```
### Payloads [Baseline]() [Compare]() ### Span.Sorting.QuickSortArray(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
--- ### Run Information Name | Value -- | -- Architecture | x64 OS | Windows 10.0.18362 Queue | TigerWindows Baseline | [1a2f095fb212dcbf394f01122b9f317b7cc70fdb](https://github.com/dotnet/runtime/commit/1a2f095fb212dcbf394f01122b9f317b7cc70fdb) Compare | [2361c00717a54a5dd9b0cf727102d64f783855b9](https://github.com/dotnet/runtime/commit/2361c00717a54a5dd9b0cf727102d64f783855b9) Diff | [Diff](https://github.com/dotnet/runtime/compare/1a2f095fb212dcbf394f01122b9f317b7cc70fdb...2361c00717a54a5dd9b0cf727102d64f783855b9) Configs | CompilationMode:tiered, RunKind:micro ### Regressions in Benchstone.BenchI.EightQueens Benchmark | Baseline | Test | Test/Base | Test Quality | Edge Detector | Baseline IR | Compare IR | IR Ratio -- | -- | -- | -- | -- | -- | -- | -- | -- |
  • [Test - Duration of single invocation]()
  • ๐Ÿ“ - [Benchmark Source]()
  • [๐Ÿ“ˆ - ADX Test Multi Config Graph]()
| 1.80 ฮผs | 2.04 ฮผs | 1.13 | 0.03 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.EightQueens*' ```
### Payloads [Baseline]() [Compare]() ### Benchstone.BenchI.EightQueens.Test #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
Author: performanceautofiler[bot]
Assignees: -
Labels: `os-windows`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `needs-area-label`
Milestone: -
BruceForstall commented 7 months ago

Maybe https://github.com/dotnet/runtime/pull/97722?

AndyAyersMS commented 1 month ago

EightQueens seems to be an intel-only regression, and then only on some cases, and two other regressions since.

image

Most all the time is in TryMe.

Codegen from baseline to latest shows RBO did one jump thread (from https://github.com/dotnet/runtime/pull/97722), different layout, and an IV widening.

There are a lot of spilled CSEs here in both baseline and latest codegen, but more spill occurrences in latest. Possibly the one extra jump thread by RBO has created more critical edges and so made life more difficult for LSRA.

Final flow graphs. You can clearly see the impact of RPO layout at least...

MAIN BASELINE
AndyAyersMS commented 1 month ago

Span.Sorting.QuickSortArray(Size: 512)

Regressions here were fixed by RPO layout:

image

AndyAyersMS commented 1 month ago

System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase))

Ditto for this benchmark

image

AndyAyersMS commented 1 month ago

System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase))

image

Same as the two above, recovers with later changes.

AndyAyersMS commented 1 month ago

Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty

Ditto like the above

image

AndyAyersMS commented 1 month ago

So the only persisted regression is in 8 queens, and that one seems to be the increase in resolution moves by the allocator.

Going to move this to .NET 10 as there's no simple fix available now.