Open performanceautofiler[bot] opened 2 weeks ago
@amanasifkhalid FYI
Improvements:
Regressions:
I took a look at a few of the regressions, and many of them seem to stem from mis-rotated loops. Because the cost model currently doesn't differentiate between conditional and unconditional jumps, 3-opt tends to make naive decisions about moving loop exits. For example, from Struct.SpanWrapper.WrapperSum
:
*************** In fgSearchImprovedLayout()
Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004] 1 1 2 [???..???)-> BB03(1) (always) i LIR IBC internal
BB02 [0001] 1 BB06 99.41 167 [00C..019)-> BB03(1) (always) i LIR IBC loophead bwd bwd-target
BB03 [0002] 2 BB02,BB01 100 168 [019..01A)-> BB05(0.48),BB04(0.52) ( cond ) i LIR IBC bwd bwd-src osr-entry
BB04 [0010] 1 BB03 52.00 87 [019..01A)-> BB06(1) (always) i LIR IBC bwd
BB06 [0012] 2 BB04,BB05 100 168 [019..022)-> BB02(0.994),BB07(0.00595) ( cond ) i LIR IBC bwd bwd-src
BB05 [0011] 1 BB03 48 81 [019..01A)-> BB06(1) (always) i LIR IBC bwd
BB07 [0003] 1 BB06 0.60 1 [022..024) (return) i LIR IBC
BB08 [0013] 0 0 [???..???) (throw ) i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Running 3-opt for main method body
Creating fallthrough for BB06 -> BB02 (current partition score = 87.394958, new partition score = 167.067227)
Creating fallthrough for BB04 -> BB06 (current partition score = 87.394958, new partition score = 168.067227)
*************** Finishing PHASE Optimize layout
Trees after Optimize layout
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0004] 1 1 2 [???..???)-> BB03(1) (always) i LIR IBC internal
BB04 [0010] 1 BB03 52.00 87 [019..01A)-> BB06(1) (always) i LIR IBC bwd
BB06 [0012] 2 BB04,BB05 100 168 [019..022)-> BB02(0.994),BB07(0.00595) ( cond ) i LIR IBC bwd bwd-src
BB02 [0001] 1 BB06 99.41 167 [00C..019)-> BB03(1) (always) i LIR IBC loophead bwd bwd-target
BB03 [0002] 2 BB02,BB01 100 168 [019..01A)-> BB05(0.48),BB04(0.52) ( cond ) i LIR IBC bwd bwd-src osr-entry
BB05 [0011] 1 BB03 48 81 [019..01A)-> BB06(1) (always) i LIR IBC bwd
BB07 [0003] 1 BB06 0.60 1 [022..024) (return) i LIR IBC
BB08 [0013] 0 0 [???..???) (throw ) i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
If we can tweak the cost model such that it decides creating fallthrough for BB06 -> BB02
is unprofitable, then 3-opt will instead create fallthrough for BB06 -> BB07
, thus creating the ideal loop exit shape. As a consequence, we will push BB05
further out-of-line; in order to consider moving BB05
back into the loop body, we'd probably have to model forward vs backward jumps in the cost model to make such a move profitable.
PerfLabTests.LowLevelPerf.ForEachOverList100Elements
has a similar shape:
*************** In fgSearchImprovedLayout()
Initial BasicBlocks
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010] 1 1 116 [???..???)-> BB04(1) (always) i LIR IBC internal
BB03 [0003] 1 BB09 98.56 11470 [015..021)-> BB04(1) (always) i LIR IBC loophead bwd bwd-target
BB04 [0004] 3 BB02,BB03,BB01 100 11637 [021..022)-> BB11(0.2),BB05(0.8) ( cond ) i LIR IBC bwd bwd-src osr-entry
BB05 [0018] 1 BB04 80 9309 [021..022)-> BB07(0.48),BB06(0.52) ( cond ) i LIR IBC bwd
BB06 [0019] 1 BB05 41.60 4841 [021..022)-> BB09(1) (always) i LIR IBC idxlen bwd
BB07 [0020] 1 BB05 58.40 6796 [021..022)-> BB09(1) (always) i LIR IBC bwd
BB09 [0021] 2 BB06,BB07 100 11637 [021..02A)-> BB03(0.986),BB10(0.0144) ( cond ) i LIR IBC bwd bwd-src
BB10 [0005] 1 BB09 1.44 167 [02A..046)-> BB02(0.994),BB12(0.00595) ( cond ) i LIR IBC bwd
BB02 [0001] 1 BB10 1.44 167 [00C..013)-> BB04(1) (always) i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009] 1 BB10 0.01 1 [046..048) (return) i LIR IBC
BB11 [0023] 1 BB04 0 0 [021..022) (throw ) i LIR IBC rare hascall gcsafe bwd
BB13 [0028] 0 0 [???..???) (throw ) i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Running 3-opt for main method body
Creating fallthrough for BB09 -> BB03 (current partition score = 6962.966716, new partition score = 11469.746967)
Creating fallthrough for BB07 -> BB09 (current partition score = 0.000000, new partition score = 6795.899489)
*************** Finishing PHASE Optimize layout
Trees after Optimize layout
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds weight IBC [IL range] [jump] [EH region] [flags]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
BB01 [0010] 1 1 116 [???..???)-> BB04(1) (always) i LIR IBC internal
BB07 [0020] 1 BB05 58.40 6796 [021..022)-> BB09(1) (always) i LIR IBC bwd
BB09 [0021] 2 BB06,BB07 100 11637 [021..02A)-> BB03(0.986),BB10(0.0144) ( cond ) i LIR IBC bwd bwd-src
BB03 [0003] 1 BB09 98.56 11470 [015..021)-> BB04(1) (always) i LIR IBC loophead bwd bwd-target
BB04 [0004] 3 BB02,BB03,BB01 100 11637 [021..022)-> BB11(0.2),BB05(0.8) ( cond ) i LIR IBC bwd bwd-src osr-entry
BB05 [0018] 1 BB04 80 9309 [021..022)-> BB07(0.48),BB06(0.52) ( cond ) i LIR IBC bwd
BB06 [0019] 1 BB05 41.60 4841 [021..022)-> BB09(1) (always) i LIR IBC idxlen bwd
BB10 [0005] 1 BB09 1.44 167 [02A..046)-> BB02(0.994),BB12(0.00595) ( cond ) i LIR IBC bwd
BB02 [0001] 1 BB10 1.44 167 [00C..013)-> BB04(1) (always) i LIR IBC loophead nullcheck bwd bwd-target
BB12 [0009] 1 BB10 0.01 1 [046..048) (return) i LIR IBC
BB11 [0023] 1 BB04 0 0 [021..022) (throw ) i LIR IBC rare hascall gcsafe bwd
BB13 [0028] 0 0 [???..???) (throw ) i LIR rare keep internal
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
I suspect making the move BB09 -> BB03
unprofitable with some constant for conditional jumps would fix this.
Since 3-opt currently optimizes for maximal layout score (only because it's cheaper to sum the weights of edges that now fall through, rather than sum the weights of edges that now don't fall through), I suspect we want to begin by penalizing scores for conditional jumps by some multiplier k
, where 0 < k < 1
. @AndyAyersMS do you have a recommended starting point for k
, or is this a matter of trial and error? I suppose if we want to try modeling something as granular as described in Young et. al.'s Near-optimal Intraprocedural Branch Alignment, we're better off refactoring 3-opt to minimize cost instead of maximizing score.
penalizing scores for conditional jumps by some multiplier
k
I would think the value of k
would be dependent on the likelihood of branching; something like k = 1 - (likelihood of branching)
. But this isn't quite right because a highly predictable branch should be somewhat cheaper than a less predictable branch (and we can use likelihoods close to 1 as indicators of predictability).
But I agree it is confusing to think in benefit terms, as I really think of this as a cost minimization problem....
Github missed linking the original PR: https://github.com/dotnet/runtime/pull/103450
Run Information
Regressions in System.Collections.Tests.Perf_BitArray
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in Span.IndexerBench
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in System.Globalization.Tests.StringSearch
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in PerfLabTests.LowLevelPerf
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in System.Collections.IterateForEach<Int32>
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in System.Memory.Span<Int32>
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in System.Tests.Perf_Char
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in Struct.SpanWrapper
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in System.Collections.Tests.Perf_PriorityQueue<Int32, Int32>
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Run Information
Regressions in Span.Sorting
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md