During SSA transformation most of the run-time was spend in compute_predecessors. According to perf, the reason for this was the amount of hashing required in the algorithm.
The following graph shows the execution time of the SSA transformation for a small single-loop program. The control flow graph is obtained by unrolling the loop k times (roughly speaking: duplicating the loop body k times and adding additional CFG edges to obtain a loop-free program). The resulting CFG has 8+k*2 basic blocks, meaning that we measure the execution time of the SSA transformation of 8+k*2 basic blocks.
For k=150 the SSA transformation with the optimized compute_predecessors is about twice as fast as master.
During SSA transformation most of the run-time was spend in
compute_predecessors
. According to perf, the reason for this was the amount of hashing required in the algorithm.The following graph shows the execution time of the SSA transformation for a small single-loop program. The control flow graph is obtained by unrolling the loop
k
times (roughly speaking: duplicating the loop bodyk
times and adding additional CFG edges to obtain a loop-free program). The resulting CFG has8+k*2
basic blocks, meaning that we measure the execution time of the SSA transformation of8+k*2
basic blocks.For
k=150
the SSA transformation with the optimizedcompute_predecessors
is about twice as fast as master.