Graph: Optimize compute_predecessors

During SSA transformation most of the run-time was spend in compute_predecessors. According to perf, the reason for this was the amount of hashing required in the algorithm.

The following graph shows the execution time of the SSA transformation for a small single-loop program. The control flow graph is obtained by unrolling the loop k times (roughly speaking: duplicating the loop body k times and adding additional CFG edges to obtain a loop-free program). The resulting CFG has 8+k*2 basic blocks, meaning that we measure the execution time of the SSA transformation of 8+k*2 basic blocks.

For k=150 the SSA transformation with the optimized compute_predecessors is about twice as fast as master.

falconre / falcon

Graph: Optimize compute_predecessors #76