Continuation of #93020. During the .NET 9 development cycle, we removed much of the JIT flowgraph implementation's implicit fall-through invariants, and introduced a new block layout strategy based on a reverse post-order traversal of the graph. For .NET 10, we'd like to push this work further in both directions, with the ultimate goals of zero dependence on lexical block ordering in the JIT's frontend, and a global cost-optimizing layout algorithm in the JIT's backend. Below is an early estimate of what each item entails:
Flowgraph Modernization
[x] Move block layout to the backend, after lowering/LSRA. At this point, we know the JIT won't introduce new blocks, so one reordering pass should be sufficient. Note that the flowgraph transformation steps in fgUpdateFlowGraph that we usually run in conjunction with layout aren't designed to run after lowering, so we will likely need to decouple flow opts from layout to make this work.
[x] #107483
[x] #107634
[ ] Ensure backend phases aren't sensitive to block ordering before layout. In particular, LSRA uses its own traversal logic for visiting blocks. Modifying this traversal logic to be agnostic to lexical ordering may facilitate moving layout to after LSRA.
[x] #107927
[x] #108086
[ ] Remove premature ordering logic during basic block creation. Block creation helpers like fgNewBBinRegion may search the block list for insertion points that won't break up existing fall-through. In the JIT frontend, it should make no difference to optimization potential if we just insert new blocks at the end of the list, or at the end of an EH region. Doing this work early should help expose frontend phases that are still sensitive to lexical block ordering (see next task).
[x] #107371
[x] #107403
[ ] #107585
[ ] #107419
[ ] Refactor frontend phases to not depend on lexical block ordering. As of writing, we know a few phases that should be graph-based:
[ ] Loop inversion
[ ] #109346
[ ] Switch recognition
[ ] Flowgraph simplification
[ ] optSetBlockWeights (see Profile Data section)
[ ] Continue to remove premature checks for fall-through behavior. The removal of the BBJ_NONE block type left behind breadcrumbs in various phases that we ought to clean up, now that we can model flow explicitly.
[ ] Consider enforcing stronger flowgraph invariants (such as no uncompacted blocks) between phases to reduce the burden of work on fgUpdateFlowGraph.
[ ] Continue deferred .NET 9 items
Block layout
Ideally, the below items get us to a state where block layout produces the "best" ordering it can, given the profile data it has on-hand. If the layout is subpar due to missing/inconsistent profile data, we can at least eliminate the layout strategy as the culprit.
[x] Implement 3-opt pass on top of the RPO-based layout, modeling layout cost with edge weights
[x] #103450
[ ] Consider modeling cost of (un)conditional and forward/backward branches in layout cost for 3-opt
[ ] Consider how 3-opt's layout decisions may affect hot/cold splitting
[ ] Consider how we can achieve acceptable throughput, while running for enough iterations to achieve near-optimal layout
[ ] Continued deferred .NET 9 items
Profile Maintenance
[ ] Continue expanding profile consistency checks through the JIT's frontend. Currently, we bail after inlining.
[ ] Consider replacing optSetBlockWeights with the new profile synthesis implementation. The former frequently produces nonsensical weights for loops, as it relies on a lexical traversal of the block list to identify loops. Fixing this may improve JitOptRepeat performance.
[ ] Consider running profile synthesis right before layout.
[ ] Allow profile data to override the JIT's heuristics more explicitly. For example, if profile data suggests a BBJ_THROW block is hot, then order it as such (this particular example is not as perf-sensitive, though).
Continuation of #93020. During the .NET 9 development cycle, we removed much of the JIT flowgraph implementation's implicit fall-through invariants, and introduced a new block layout strategy based on a reverse post-order traversal of the graph. For .NET 10, we'd like to push this work further in both directions, with the ultimate goals of zero dependence on lexical block ordering in the JIT's frontend, and a global cost-optimizing layout algorithm in the JIT's backend. Below is an early estimate of what each item entails:
Flowgraph Modernization
fgUpdateFlowGraph
that we usually run in conjunction with layout aren't designed to run after lowering, so we will likely need to decouple flow opts from layout to make this work.fgNewBBinRegion
may search the block list for insertion points that won't break up existing fall-through. In the JIT frontend, it should make no difference to optimization potential if we just insert new blocks at the end of the list, or at the end of an EH region. Doing this work early should help expose frontend phases that are still sensitive to lexical block ordering (see next task).optSetBlockWeights
(see Profile Data section)BBJ_NONE
block type left behind breadcrumbs in various phases that we ought to clean up, now that we can model flow explicitly.fgUpdateFlowGraph
.Block layout Ideally, the below items get us to a state where block layout produces the "best" ordering it can, given the profile data it has on-hand. If the layout is subpar due to missing/inconsistent profile data, we can at least eliminate the layout strategy as the culprit.
Profile Maintenance
optSetBlockWeights
with the new profile synthesis implementation. The former frequently produces nonsensical weights for loops, as it relies on a lexical traversal of the block list to identify loops. Fixing this may improveJitOptRepeat
performance.BBJ_THROW
block is hot, then order it as such (this particular example is not as perf-sensitive, though).cc @dotnet/jit-contrib, @AndyAyersMS