dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.76k forks source link

Regressions from 3-opt #109613

Open performanceautofiler[bot] opened 2 weeks ago

performanceautofiler[bot] commented 2 weeks ago

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_BitArray

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
30.10 ns 34.89 ns 1.16 0.13 False
29.06 ns 34.64 ns 1.19 0.14 False
28.98 ns 34.14 ns 1.18 0.12 False
22.98 ns 27.61 ns 1.20 0.20 False

graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_BitArray*'
### System.Collections.Tests.Perf_BitArray.BitArrayAnd(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayOr(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayXor(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.Tests.Perf_BitArray.BitArrayNot(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.IndexerBench

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
1.21 ΞΌs 1.38 ΞΌs 1.14 0.00 False
1.72 ΞΌs 2.06 ΞΌs 1.20 0.00 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.IndexerBench*'
### Span.IndexerBench.CoveredIndex2(length: 1024) #### ETL Files #### Histogram #### JIT Disasms ### Span.IndexerBench.CoveredIndex3(length: 1024) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Globalization.Tests.StringSearch

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
539.40 ns 623.39 ns 1.16 0.01 False
540.64 ns 624.02 ns 1.15 0.01 False
539.27 ns 623.78 ns 1.16 0.01 False
768.06 ns 850.33 ns 1.11 0.01 False
539.11 ns 624.33 ns 1.16 0.01 False
539.27 ns 624.84 ns 1.16 0.01 False
798.90 ns 885.54 ns 1.11 0.03 False
767.89 ns 852.57 ns 1.11 0.01 False
539.20 ns 624.38 ns 1.16 0.01 False
799.43 ns 878.06 ns 1.10 0.00 False

graph graph graph graph graph graph graph graph graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Globalization.Tests.StringSearch*'
### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, IgnoreNonSpace, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (en-US, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (en-US, None, False)) #### ETL Files #### Histogram #### JIT Disasms ### System.Globalization.Tests.StringSearch.LastIndexOf_Word_NotFound(Options: (, IgnoreCase, False)) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in PerfLabTests.LowLevelPerf

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
8.45 ms 10.10 ms 1.19 0.01 False
303.39 ΞΌs 334.28 ΞΌs 1.10 0.05 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'PerfLabTests.LowLevelPerf*'
### PerfLabTests.LowLevelPerf.ForeachOverList100Elements #### ETL Files #### Histogram #### JIT Disasms ### PerfLabTests.LowLevelPerf.InterfaceInterfaceMethodLongHierarchy #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.IterateForEach<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
261.70 ns 349.31 ns 1.33 0.01 False
434.29 ns 519.44 ns 1.20 0.01 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.IterateForEach&lt;Int32&gt;*'
### System.Collections.IterateForEach<Int32>.FrozenSet(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Collections.IterateForEach<Int32>.List(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Memory.Span<Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
863.82 ns 1.12 ΞΌs 1.29 0.01 False
1.04 ΞΌs 1.29 ΞΌs 1.25 0.01 False

graph graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Memory.Span&lt;Int32&gt;*'
### System.Memory.Span<Int32>.IndexOfAnyFourValues(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### System.Memory.Span<Int32>.IndexOfAnyFiveValues(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Tests.Perf_Char

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
25.08 ns 32.27 ns 1.29 0.03 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Tests.Perf_Char*'
### System.Tests.Perf_Char.Char_IsLower(input: "Good afternoon, Constable!") #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Struct.SpanWrapper

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
6.79 ΞΌs 10.05 ΞΌs 1.48 0.01 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Struct.SpanWrapper*'
### Struct.SpanWrapper.WrapperSum #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
104.38 ns 119.96 ns 1.15 0.01 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'System.Collections.Tests.Perf_PriorityQueue&lt;Int32, Int32&gt;*'
### System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>.Enumerate(Size: 100) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)

Run Information

Name Value
Architecture arm64
OS ubuntu 22.04
Queue AmpereUbuntu
Baseline 408caa4e28c74d95c2af00401615a0931de4facf
Compare 73e1976f9510674d99bf4edbbe7392eac2843d41
Diff Diff
Configs CompilationMode:tiered, RunKind:micro

Regressions in Span.Sorting

Benchmark Baseline Test Test/Base Test Quality Edge Detector Baseline IR Compare IR IR Ratio
218.42 ΞΌs 242.60 ΞΌs 1.11 0.00 False

graph Test Report

Repro

General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md

git clone https://github.com/dotnet/performance.git
python3 .\performance\scripts\benchmarks_ci.py -f net8.0 --filter 'Span.Sorting*'
### Span.Sorting.BubbleSortSpan(Size: 512) #### ETL Files #### Histogram #### JIT Disasms ### Docs [Profiling workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/profiling-workflow-dotnet-runtime.md) [Benchmarking workflow for dotnet/runtime repository](https://github.com/dotnet/performance/blob/master/docs/benchmarking-workflow-dotnet-runtime.md)
AndyAyersMS commented 2 weeks ago

https://github.com/dotnet/runtime/pull/103450

AndyAyersMS commented 2 weeks ago

@amanasifkhalid FYI

Improvements:

Regressions:

amanasifkhalid commented 2 weeks ago

I took a look at a few of the regressions, and many of them seem to stem from mis-rotated loops. Because the cost model currently doesn't differentiate between conditional and unconditional jumps, 3-opt tends to make naive decisions about moving loop exits. For example, from Struct.SpanWrapper.WrapperSum:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB06 -> BB02 (current partition score = 87.394958, new partition score = 167.067227)
Creating fallthrough for BB04 -> BB06 (current partition score = 87.394958, new partition score = 168.067227)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight   IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004]  1                             1      2 [???..???)-> BB03(1)                 (always)                     i LIR IBC internal
BB04 [0010]  1       BB03                 52.00  87 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB06 [0012]  2       BB04,BB05           100    168 [019..022)-> BB02(0.994),BB07(0.00595)   ( cond )                     i LIR IBC bwd bwd-src
BB02 [0001]  1       BB06                 99.41 167 [00C..019)-> BB03(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB03 [0002]  2       BB02,BB01           100    168 [019..01A)-> BB05(0.48),BB04(0.52)   ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0011]  1       BB03                 48     81 [019..01A)-> BB06(1)                 (always)                     i LIR IBC bwd
BB07 [0003]  1       BB06                  0.60   1 [022..024)                           (return)                     i LIR IBC
BB08 [0013]  0                             0        [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

If we can tweak the cost model such that it decides creating fallthrough for BB06 -> BB02 is unprofitable, then 3-opt will instead create fallthrough for BB06 -> BB07, thus creating the ideal loop exit shape. As a consequence, we will push BB05 further out-of-line; in order to consider moving BB05 back into the loop body, we'd probably have to model forward vs backward jumps in the cost model to make such a move profitable.

PerfLabTests.LowLevelPerf.ForEachOverList100Elements has a similar shape:

*************** In fgSearchImprovedLayout()

Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Running 3-opt for main method body
Creating fallthrough for BB09 -> BB03 (current partition score = 6962.966716, new partition score = 11469.746967)
Creating fallthrough for BB07 -> BB09 (current partition score = 0.000000, new partition score = 6795.899489)

*************** Finishing PHASE Optimize layout
Trees after Optimize layout

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight     IBC [IL range]   [jump]                            [EH region]        [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010]  1                             1      116 [???..???)-> BB04(1)                 (always)                     i LIR IBC internal
BB07 [0020]  1       BB05                 58.40  6796 [021..022)-> BB09(1)                 (always)                     i LIR IBC bwd
BB09 [0021]  2       BB06,BB07           100    11637 [021..02A)-> BB03(0.986),BB10(0.0144)  ( cond )                     i LIR IBC bwd bwd-src
BB03 [0003]  1       BB09                 98.56 11470 [015..021)-> BB04(1)                 (always)                     i LIR IBC loophead bwd bwd-target
BB04 [0004]  3       BB02,BB03,BB01      100    11637 [021..022)-> BB11(0.2),BB05(0.8)     ( cond )                     i LIR IBC bwd bwd-src osr-entry
BB05 [0018]  1       BB04                 80     9309 [021..022)-> BB07(0.48),BB06(0.52)   ( cond )                     i LIR IBC bwd
BB06 [0019]  1       BB05                 41.60  4841 [021..022)-> BB09(1)                 (always)                     i LIR IBC idxlen bwd
BB10 [0005]  1       BB09                  1.44   167 [02A..046)-> BB02(0.994),BB12(0.00595)   ( cond )                     i LIR IBC bwd
BB02 [0001]  1       BB10                  1.44   167 [00C..013)-> BB04(1)                 (always)                     i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009]  1       BB10                  0.01     1 [046..048)                           (return)                     i LIR IBC
BB11 [0023]  1       BB04                  0        0 [021..022)                           (throw )                     i LIR IBC rare hascall gcsafe bwd
BB13 [0028]  0                             0          [???..???)                           (throw )                     i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

I suspect making the move BB09 -> BB03 unprofitable with some constant for conditional jumps would fix this.

Since 3-opt currently optimizes for maximal layout score (only because it's cheaper to sum the weights of edges that now fall through, rather than sum the weights of edges that now don't fall through), I suspect we want to begin by penalizing scores for conditional jumps by some multiplier k, where 0 < k < 1. @AndyAyersMS do you have a recommended starting point for k, or is this a matter of trial and error? I suppose if we want to try modeling something as granular as described in Young et. al.'s Near-optimal Intraprocedural Branch Alignment, we're better off refactoring 3-opt to minimize cost instead of maximizing score.

AndyAyersMS commented 2 weeks ago

penalizing scores for conditional jumps by some multiplier k

I would think the value of k would be dependent on the likelihood of branching; something like k = 1 - (likelihood of branching). But this isn't quite right because a highly predictable branch should be somewhat cheaper than a less predictable branch (and we can use likelihoods close to 1 as indicators of predictability).

But I agree it is confusing to think in benefit terms, as I really think of this as a cost minimization problem....

LoopedBard3 commented 1 week ago

Github missed linking the original PR: https://github.com/dotnet/runtime/pull/103450