Ineffectiveness of cumulative counting
Cumulative counts tend to cross the threshold near exit blocks, which are often not ideal splitting points.
Plus, large counts mostly come from long chains of small-medium blocks, and they almost never cause register shortage when actually run on the target machine.
Our main targets are cases where 1~2 giant blocks overwhelm the others, taking up majority of the function.
Weakness of dominator-region method
Exit points in IR source are typically tied together as phi nodes, so finding a cleanly isolated region in terms of domination is extremely rare.
Also, loops with large bodies easily escape dominator-based methods.
Current implementation: Block-based
Blocks larger than certain threshold are tested for outlining, but the outline region does not grow over that block.
The usability is still severely limited though; it cannot act in any benchmark input.
This is because long blocks are mostly filled with short-distanced, non-repeating def-use pairs.
For instance, the block for.body6 in function matmul3/matmul has >340 instructions but never crosses >5 estimated arguments while scanning for every possible split point.
Possible improvements
One of the variations tried is an combinational approach: start with splitting a large block and gradually expand the outline region into its successors in hopes of maximal register reuse.
This version is making errors like address misalign, and will hopefully be resolved later.
Outlining schemes and their problems
Ineffectiveness of cumulative counting Cumulative counts tend to cross the threshold near exit blocks, which are often not ideal splitting points. Plus, large counts mostly come from long chains of small-medium blocks, and they almost never cause register shortage when actually run on the target machine. Our main targets are cases where 1~2 giant blocks overwhelm the others, taking up majority of the function.
Weakness of dominator-region method Exit points in IR source are typically tied together as phi nodes, so finding a cleanly isolated region in terms of domination is extremely rare. Also, loops with large bodies easily escape dominator-based methods.
Current implementation: Block-based Blocks larger than certain threshold are tested for outlining, but the outline region does not grow over that block. The usability is still severely limited though; it cannot act in any benchmark input. This is because long blocks are mostly filled with short-distanced, non-repeating def-use pairs. For instance, the block
for.body6
in functionmatmul3
/matmul
has >340 instructions but never crosses >5 estimated arguments while scanning for every possible split point.Possible improvements One of the variations tried is an combinational approach: start with splitting a large block and gradually expand the outline region into its successors in hopes of maximal register reuse. This version is making errors like address misalign, and will hopefully be resolved later.