Regressions from 3-opt - Githubissues

performanceautofiler[bot] commented 2 weeks ago

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
[BitArrayAnd - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	30.10 ns	34.89 ns	1.16	0.13	False
[BitArrayOr - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.Tests.Perf_BitArray.BitArrayOr(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	29.06 ns	34.64 ns	1.19	0.14	False
[BitArrayXor - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.Tests.Perf_BitArray.BitArrayXor(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	28.98 ns	34.14 ns	1.18	0.12	False
[BitArrayNot - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.Tests.Perf_BitArray.BitArrayNot(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	22.98 ns	27.61 ns	1.20	0.20	False

graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_BitArray*'

### System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayOr(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Span.IndexerBench

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[CoveredIndex2 - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/Span.IndexerBench.CoveredIndex2(length%3a%201024).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	1.21 μs	1.38 μs	1.14	0.00	False
[CoveredIndex3 - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/Span.IndexerBench.CoveredIndex3(length%3a%201024).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	1.72 μs	2.06 μs	1.20	0.00	False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.IndexerBench*'

### Span.IndexerBench.CoveredIndex2(length: 1024) #### ETL Files #### Histogram #### JIT Disasms ### Span.IndexerBench.CoveredIndex3(length: 1024) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector
[IndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	539.40 ns	623.39 ns	1.16	0.01	False
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20None%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	540.64 ns	624.02 ns	1.15	0.01	False
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreNonSpace%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	539.27 ns	623.78 ns	1.16	0.01	False
[IndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	768.06 ns	850.33 ns	1.11	0.01	False
[IndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreNonSpace%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	539.11 ns	624.33 ns	1.16	0.01	False
[IndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	539.27 ns	624.84 ns	1.16	0.01	False
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20IgnoreCase%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	798.90 ns	885.54 ns	1.11	0.03	False
[IndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	767.89 ns	852.57 ns	1.11	0.01	False
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(en-US%2c%20None%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	539.20 ns	624.38 ns	1.16	0.01	False
[LastIndexOf_Word_NotFound - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options%3a%20(%2c%20IgnoreCase%2c%20False)).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	799.43 ns	878.06 ns	1.10	0.00	False

graph graph graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'

### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.LowLevelPerf

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[ForeachOverList100Elements - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/PerfLabTests.LowLevelPerf.ForeachOverList100Elements.html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	8.45 ms	10.10 ms	1.19	0.01	False
[InterfaceInterfaceMethodLongHierarchy - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/PerfLabTests.LowLevelPerf.InterfaceInterfaceMethodLongHierarchy.html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	303.39 μs	334.28 μs	1.10	0.05	False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.LowLevelPerf*'

### PerfLabTests.LowLevelPerf.ForeachOverList100Elements #### ETL Files #### Histogram #### JIT Disasms ### PerfLabTests.LowLevelPerf.InterfaceInterfaceMethodLongHierarchy #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[FrozenSet - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.IterateForEach(Int32).FrozenSet(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	261.70 ns	349.31 ns	1.33	0.01	False
[List - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.IterateForEach(Int32).List(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	434.29 ns	519.44 ns	1.20	0.01	False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'

### System.Collections.IterateForEach<Int32>.FrozenSet(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.IterateForEach<Int32>.List(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[IndexOfAnyFourValues - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Memory.Span(Int32).IndexOfAnyFourValues(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	863.82 ns	1.12 μs	1.29	0.01	False
[IndexOfAnyFiveValues - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Memory.Span(Int32).IndexOfAnyFiveValues(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	1.04 μs	1.29 μs	1.25	0.01	False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'

### System.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.IndexOfAnyFiveValues(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[Char_IsLower - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Tests.Perf_Char.Char_IsLower(input%3a%20%22Good%20afternoon%2c%20Constable!%22).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	25.08 ns	32.27 ns	1.29	0.03	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'

### System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!") #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Struct.SpanWrapper

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[WrapperSum - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/Struct.SpanWrapper.WrapperSum.html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	6.79 μs	10.05 μs	1.48	0.01	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.SpanWrapper*'

### Struct.SpanWrapper.WrapperSum #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[Enumerate - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/System.Collections.Tests.Perf_PriorityQueue(Int32%2c%20Int32).Enumerate(Size%3a%20100).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	104.38 ns	119.96 ns	1.15	0.01	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue&lt;Int32, Int32&gt;*'

### System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 100) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name	Value
Architecture	arm64
OS	ubuntu 22.04
Queue	AmpereUbuntu
Baseline	408caa4e28c74d95c2af00401615a0931de4facf
Compare	73e1976f9510674d99bf4edbbe7392eac2843d41
Diff	Diff
Configs	CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark	Baseline	Test	Test/Base	Test Quality	Edge Detector	Baseline IR	Compare IR	IR Ratio
[BubbleSortSpan - Duration of single invocation](<https://pvscmdupload.z22.web.core.windows.net/reports/allTestHistory/refs/heads/main_arm64_ubuntu 22.04/Span.Sorting.BubbleSortSpan(Size%3a%20512).html>) 📝 - Benchmark Source ADX - Test Multi Config Graph	218.42 μs	242.60 μs	1.11	0.00	False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'

### Span.Sorting.BubbleSortSpan(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

AndyAyersMS commented 2 weeks ago

https://github.com/dotnet/runtime/pull/103450

AndyAyersMS commented 2 weeks ago

@amanasifkhalid FYI

Improvements:

Regressions:

amanasifkhalid commented 2 weeks ago

I took a look at a few of the regressions, and many of them seem to stem from mis-rotated loops. Because the cost model currently doesn't differentiate between conditional and unconditional jumps, 3-opt tends to make naive decisions about moving loop exits. For example, from Struct.SpanWrapper.WrapperSum:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB06 -> BB02 (current partition score = 87.394958, new partition score = 167.067227)
Creating fallthrough for BB04 -> BB06 (current partition score = 87.394958, new partition score = 168.067227)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we can tweak the cost model such that it decides creating fallthrough for BB06 -> BB02 is unprofitable, then 3-opt will instead create fallthrough for BB06 -> BB07, thus creating the ideal loop exit shape. As a consequence, we will push BB05 further out-of-line; in order to consider moving BB05 back into the loop body, we'd probably have to model forward vs backward jumps in the cost model to make such a move profitable.

PerfLabTests.LowLevelPerf.ForEachOverList100Elements has a similar shape:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB09 -> BB03 (current partition score = 6962.966716, new partition score = 11469.746967)
Creating fallthrough for BB07 -> BB09 (current partition score = 0.000000, new partition score = 6795.899489)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I suspect making the move BB09 -> BB03 unprofitable with some constant for conditional jumps would fix this.

Since 3-opt currently optimizes for maximal layout score (only because it's cheaper to sum the weights of edges that now fall through, rather than sum the weights of edges that now don't fall through), I suspect we want to begin by penalizing scores for conditional jumps by some multiplier k, where 0 < k < 1. @AndyAyersMS do you have a recommended starting point for k, or is this a matter of trial and error? I suppose if we want to try modeling something as granular as described in Young et. al.'s Near-optimal Intraprocedural Branch Alignment, we're better off refactoring 3-opt to minimize cost instead of maximizing score.

AndyAyersMS commented 2 weeks ago

penalizing scores for conditional jumps by some multiplier k

I would think the value of k would be dependent on the likelihood of branching; something like k = 1 - (likelihood of branching). But this isn't quite right because a highly predictable branch should be somewhat cheaper than a less predictable branch (and we can use likelihoods close to 1 as indicators of predictability).

But I agree it is confusing to think in benefit terms, as I really think of this as a cost minimization problem....

LoopedBard3 commented 1 week ago

Github missed linking the original PR: https://github.com/dotnet/runtime/pull/103450

dotnet / runtime

Regressions from 3-opt #109613

Run Information

Regressions in System.Collections.Tests.Perf_BitArray

Repro

Run Information

Regressions in Span.IndexerBench

Repro

Run Information

Regressions in System.Globalization.Tests.StringSearch

Repro

Run Information

Regressions in PerfLabTests.LowLevelPerf

Repro

Run Information

Regressions in System.Collections.IterateForEach<Int32>

Repro

Run Information

Regressions in System.Memory.Span<Int32>

Repro

Run Information

Regressions in System.Tests.Perf_Char

Repro

Run Information

Regressions in Struct.SpanWrapper

Repro

Run Information

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Repro

Run Information

Regressions in Span.Sorting

Repro