Closed AndyAyersMS closed 2 months ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | AndyAyersMS |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr` |
Milestone: | 9.0.0 |
As a prerequisite for some of the block reordering work we'll likely need to change loop alignment to be more centralized. We currently identify the initial candidate blocks to place loop alignment instructions in during loop finding and apply some heuristics when computing loop side effects during VN. We will probably need to defer all of these decisions to happen after block reordering. I am also inclined to say that we should just recompute the loops at that point, instead of trying to maintain loop information -- we have a lot of code that works hard to maintain bbNatLoopNum
all the way into the backend that we could remove. Since the block reordering is likely to need a DFS as well, the extra TP we'll end up paying is just for the loop identification, which is not that much (tpdiff).
Potential .NET 10 items
Block layout:
Flowgraph Modernization:
Compiler::fgUpdateFlowGraph
, particularly those that use block list pointers (bbNext
and bbPrev
) instead of control flow to make decisions; now that various implicit fallthrough invariants are gone, some of the transformations here might be underutilized or irrelevantfgUpdateFlowGraph
's transformations to aid block layout. In particular, the old block layout would run Compiler::fgOptimizeBranch
if it decided against moving a BBJ_ALWAYS
that jumps to a BBJ_COND
. We currently don't do any such pass, though if we decide not to compact the BBJ_ALWAYS
, cloning its successor's condition can help us avoid "double-jumping" (comment).BBJ_THROW
blocks are cold, so even if there is nontrivial flow into a throw block, fgMoveColdBlocks
will do the "wrong" thing and move it to the end of the method. For methods that always throw, the performance impact of layout is probably trivial, though if we can get a close-to-optimal layout from the RPO pass, we'll avoid wasting time during 3-opt.fgFindInsertPoint
.cc @AndyAyersMS, feel free to add to this.
I think that's a good start. I'm going to move this issue to .NET 10 for now, later we can decide if we want to split off a new issue or just revise this one.
Let's split off a new issue for future work.
Overview
The current block layout algorithm in the JIT is based on local permutations of block order. It is complicated and likely far from optimal. We would like to improve the overall block layout algorithm used by the JIT, in particular adopting a global cost-minimizing approach to layout—for instance, one in the style of Young et. al.'s Near-optimal intraprocedural branch alignment. Additional complexities arise in our case because of various EH reporting requirements, so the JIT cannot freely reorder all blocks, but we should be able to apply the global ordering techniques within EH regions.
Before we can tackle this problem there are several important (and sizeable) prerequisites, which we can lump together as "flow graph modernization." There are a lot of details here, but at a high level:
It is not yet clear how much progress we can make during .NET 9. The list of items below is preliminary and subject to change.
Motivation
Past studies have shown that the two phases that benefit most from block-level PGO data are inlining and block layout. In a previous compiler project, the net benefit from PGO was on the order of 15%, with about 12% attributable to inlining, and 2% to improved layout.
The current JIT is likely seeing a much smaller benefit from layout. The goal here is to ensure that we are using the accurate PGO data to make informed decisions about the ordering of blocks, with the hope of realizing perhaps a 1 or 2% net benefit across a wide range of applications (with some benefiting much more, and others, not at all).
Flow Graph Modernization
BasicBlock
behind setters/getters (bbNext
,bbJumpKind
,bbJumpTarget
, ...)bbNext
for blocks with fall-through jump kindsbbNext
. Might be best to do this gradually, working front-to-back through the phases, and then restoring correspondence. Eventually we'll be restoring right at block layout. Main work here is root out places that implicitly rely onbbNext
ordering having some important semantic. Note thebbNext
order can still reflect likely layout order (e.g., it can agree with fall throughs after we renumber/etc.).BasicBlock
references forbbJumpTarget
with the appropriateFlowEdge
. Consider (perhaps) linkingFlowEdges
in a successor list as well as predecessor list.FlowEdge
and the code that sets them, update all customers to use new likelihood based weights.BB_UNITY_WEIGHT
to be 1.0Block Layout
Compiler::fgFindInsertPoint
, and similar logic that attempts to maintain reasonable orderings before block layout is run (see comment).cc @amanasifkhalid @dotnet/jit-contrib