Open Amanieu opened 2 years ago
FYI there is already an issue on the wasmtime side: https://github.com/bytecodealliance/wasmtime/issues/2747
@Amanieu I think that placing cold blocks at the end of the function in the linear block order should basically just work, as you say.
The two issues in the design doc could potentially have an impact on compile time (first issue -- because we'll have longer, discontiguous live ranges) and code quality (second issue -- because an out-of-lined block from an inner loop, sunk to the end, could cause the approximate metric to treat the entire remainder of the function as a hot inner loop body).
For the second, I think a simple-enough answer is to stop the approximate-loop-depth scan before any code blocks (and treat them as zero depth). That would also have a side-effect of making the spill cost low in the cold paths, which is what we want.
So I can imagine adding a method to the Function
trait something like fn first_cold_block(&self) -> Option<Block>
and then use that to end this scan early, then that should be it. Does that seem reasonable to you?
It would be useful to be able to mark some blocks as "cold" which means that they are rarely taken cold paths. The register allocator should prefer placing spills and moves in cold blocks if possible.
It turns out that very little needs to be done if we take advantage of the block ordering by requiring all cold blocks to be after normal blocks in terms of block index and instruction indices. This has the following consequences:
blockparams_out
, bundle merging will attempt to merge all branch parameters coming from normal blocks before attempting to merge ones coming from cold blocks.My only concern is that the block order will no longer be in RPO which is the ordering recommended by the documentation. While regalloc2 will still function properly, I am less sure of the impact it may have on the heuristics.