Revisit LSRA block sequence strategy

kunalspathak commented 2 years ago

LSRA comes up with a block order in which it would allocate register for the code inside it. It would sequence all the predecessors of a block before scheduling a block. It also compares the bbWeight of a block and gives preference to the block with higher weights. It is not uncommon to have multiple blocks of same weight (either because of lack of enough profile data or other reasons) and in such cases, we rely on the block number of the block as seen below.

https://github.com/dotnet/runtime/blob/96ef47b2c6d6591cebba4eb260763576621e8236/src/coreclr/jit/lsra.cpp#L1030-L1060

This is arguably not a good heuristic to arrange LSRA block sequencing. The bbNum are purely dependent on the how the blocks are laid out our data structure and might not represent the flow graph. The original thinking could be that the blocks with lower bbNum are executed first than the ones with higher numbers, but still, it is not the optimal strategy to have. If the blocks are renumbered, it can drastically change the allocation strategy because of different block sequencing coming out of it. This was seen in https://github.com/dotnet/runtime/pull/66967 where I had to skip renumbering the blocks so the impact of "unreachable block elimination" doesn't regress register allocation.

Just to give a sense, I tried following strategies and the impact it has on code size is huge.

Strategy	Methods improved	Methods regressed	Code size impact (in bytes)
Prefer `block1` over `block2` if `block1` has more predecessors than those of `block2`.	2292	3313	23028
Prefer `block1` over `block2` if `block1` has less predecessors than those of `block2`.	2669	3141	3079
Prefer `block1` over `block2` if `block1` has more `bbReach` nodes than those of `block2`.	3495	3923	14156
Prefer `block1` over `block2` if `block1` has less `bbReach` nodes than those of `block2`.	2642	3425	4155

Some other strategies could be:

Prefer blocks depending on if they have critical in/out edges.
Prefer blocks depending on if they have EH boundary in/out edges.
Prefer blocks if they are in loop (this could be mostly taken care by bbWeight but should double check).

A clever way is to have a mix of above listed strategies in choosing the block sequencing.

category:design theme:register-allocator skill-level:expert cost:medium impact:large

kunalspathak commented 2 years ago

@dotnet/jit-contrib

AndyAyersMS commented 2 years ago

See also https://github.com/dotnet/runtime/issues/9833#issuecomment-372847082 (a few years old now)

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details

LSRA comes up with a block order in which it would allocate register for the code inside it. It would sequence all the predecessors of a block before scheduling a block. It also compares the `bbWeight` of a block and gives preference to the block with higher weights. It is not uncommon to have multiple blocks of same weight (either because of lack of enough profile data or other reasons) and in such cases, we rely on the block number of the block as seen below. https://github.com/dotnet/runtime/blob/96ef47b2c6d6591cebba4eb260763576621e8236/src/coreclr/jit/lsra.cpp#L1030-L1060 This is arguably not a good heuristic to arrange LSRA block sequencing. The `bbNum` are purely dependent on the how the blocks are laid out our data structure and might not represent the flow graph. The original thinking could be that the blocks with lower `bbNum` are executed first than the ones with higher numbers, but still, it is not the optimal strategy to have. If the blocks are renumbered, it can drastically change the allocation strategy because of different block sequencing coming out of it. This was seen in https://github.com/dotnet/runtime/pull/66967 where I had to skip renumbering the blocks so the impact of "unreachable block elimination" doesn't regress register allocation. Just to give a sense, I tried following strategies and the impact it has on code size is huge. | Strategy | Methods improved | Methods regressed | Code size impact (in bytes) | |--------------------------------------------------------------------------------------------|------------------|-------------------|-----------------------------| | Prefer `block1` over `block2` if `block1` has **more predecessors** than those of `block2`. | 2292 | 3313 | 23028 | | Prefer `block1` over `block2` if `block1` has **less predecessors** than those of `block2`. | 2669 | 3141 | 3079 | | Prefer `block1` over `block2` if `block1` has **more `bbReach`** nodes than those of `block2`. | 3495 | 3923 | 14156 | | Prefer `block1` over `block2` if `block1` has **less `bbReach`** nodes than those of `block2`. | 2642 | 3425 | 4155 | Some other strategies could be: 1. Prefer blocks depending on if they have critical in/out edges. 2. Prefer blocks depending on if they have EH boundary in/out edges. 3. Prefer blocks if they are in loop (this could be mostly taken care by `bbWeight` but should double check). A clever way is to have a mix of above listed strategies in choosing the block sequencing.

Author:	kunalspathak
Assignees:	kunalspathak
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak commented 2 years ago

Some more ideas:

Ideally, for a block, we would like to schedule a block that will most likely be the successor of it (pgo wise). We have that information (may be outdated), but currently, our requirements are to sequence all the predecessors before scheduling that block. It might be tricky to get both these heuristics simultaneously.
Another idea would be to produce minimum spanning tree where blocks are connected with each other based upon the number of variables they share. Stronger the blocks connected, more variables they have in common. When time comes to scheduling a block for LSRA, we can look at which one is strongly connected to the current block and sequence that.

dotnet / runtime

Revisit LSRA block sequence strategy #66994