[Perf] Windows/x64: 5 Regressions on 2/3/2024 12:19:35 AM

performanceautofiler[bot] commented 7 months ago

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare	2361c00717a54a5dd9b0cf727102d64f783855b9
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringEquality

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[Compare_Same - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362/System.Globalization.Tests.StringEquality.Compare_Same(Count%3a%201024%2c%20Options%3a%20(en-US%2c%20OrdinalIgnoreCase)).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	761.52 ns	973.83 ns	1.28	0.00	True
[Compare_Same_Upper - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362/System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count%3a%201024%2c%20Options%3a%20(en-US%2c%20OrdinalIgnoreCase)).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.19 μs	1.28 μs	1.08	0.01	False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringEquality*'

### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare	2361c00717a54a5dd9b0cf727102d64f783855b9
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Benchmark.GetChildKeysTests

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[AddChainedConfigurationEmpty - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362/Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty.html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	14.99 ms	16.20 ms	1.08	0.02	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchmark.GetChildKeysTests*'

### Payloads [Baseline]() [Compare]() ### Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare	2361c00717a54a5dd9b0cf727102d64f783855b9
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[QuickSortArray - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362/Span.Sorting.QuickSortArray(Size%3a%20512).html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	8.54 μs	16.21 μs	1.90	0.45	True

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'

### Payloads [Baseline]() [Compare]() ### Span.Sorting.QuickSortArray(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	x64
OS	Windows 10.0.18362
Queue	TigerWindows
Baseline	1a2f095fb212dcbf394f01122b9f317b7cc70fdb
Compare	2361c00717a54a5dd9b0cf727102d64f783855b9
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Benchstone.BenchI.EightQueens

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[Test - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_x64_Windows 10.0.18362/Benchstone.BenchI.EightQueens.Test.html>) 📝 - Benchmark Source 📈 - ADX Test Multi Config Graph	1.80 μs	2.04 μs	1.13	0.03	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.EightQueens*'

### Payloads [Baseline]() [Compare]() ### Benchstone.BenchI.EightQueens.Test #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

DrewScoggins commented 7 months ago

Diff here: https://github.com/dotnet/runtime/compare/207e1fb35050240cdc9ae40093d831414c982316...df0778dc9eb9f15c9270ba1a09d475253018e824

Nothing is jumping out as the culprit, but there were a few JIT changes.

DrewScoggins commented 7 months ago

Linux related regressions: https://github.com/dotnet/perf-autofiling-issues/issues/28564

ghost commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

[Compare_Same - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 761.52 ns | 973.83 ns | 1.28 | 0.00 | True | | | |

[Compare_Same_Upper - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 1.19 μs | 1.28 μs | 1.08 | 0.01 | False | | | ![graph]() ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringEquality*' ```

### Payloads [Baseline]() [Compare]() ### System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

[AddChainedConfigurationEmpty - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 14.99 ms | 16.20 ms | 1.08 | 0.02 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchmark.GetChildKeysTests*' ```

### Payloads [Baseline]() [Compare]() ### Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

[QuickSortArray - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 8.54 μs | 16.21 μs | 1.90 | 0.45 | True | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*' ```

### Payloads [Baseline]() [Compare]() ### Span.Sorting.QuickSortArray(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

[Test - Duration of single invocation]()
📝 - [Benchmark Source]()
[📈 - ADX Test Multi Config Graph]()

| 1.80 μs | 2.04 μs | 1.13 | 0.03 | False | | | ![graph]() [Test Report]() ### Repro General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md ```cmd git clone https://github.com/dotnet/performance.git py .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Benchstone.BenchI.EightQueens*' ```

### Payloads [Baseline]() [Compare]() ### Benchstone.BenchI.EightQueens.Test #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Author:	performanceautofiler[bot]
Assignees:	-
Labels:	`os-windows`, `arch-x64`, `area-CodeGen-coreclr`, `untriaged`, `runtime-coreclr`, `needs-area-label`
Milestone:	-

BruceForstall commented 7 months ago

Maybe https://github.com/dotnet/runtime/pull/97722?

AndyAyersMS commented 1 month ago

EightQueens seems to be an intel-only regression, and then only on some cases, and two other regressions since.

Most all the time is in TryMe.

Codegen from baseline to latest shows RBO did one jump thread (from https://github.com/dotnet/runtime/pull/97722), different layout, and an IV widening.

There are a lot of spilled CSEs here in both baseline and latest codegen, but more spill occurrences in latest. Possibly the one extra jump thread by RBO has created more critical edges and so made life more difficult for LSRA.

Final flow graphs. You can clearly see the impact of RPO layout at least...

MAIN	BASELINE

AndyAyersMS commented 1 month ago

Span.Sorting.QuickSortArray(Size: 512)

Regressions here were fixed by RPO layout:

AndyAyersMS commented 1 month ago

System.Globalization.Tests.StringEquality.Compare_Same(Count: 1024, Options: (en-US, OrdinalIgnoreCase))

Ditto for this benchmark

AndyAyersMS commented 1 month ago

System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase))

Same as the two above, recovers with later changes.

AndyAyersMS commented 1 month ago

Benchmark.GetChildKeysTests.AddChainedConfigurationEmpty

Ditto like the above

AndyAyersMS commented 1 month ago

So the only persisted regression is in 8 queens, and that one seems to be the increase in resolution moves by the allocator.

Going to move this to .NET 10 as there's no simple fix available now.

dotnet / runtime