Allocation stack lowering based on escape analysis

pcc commented 10 years ago

Now that we can run libgo benchmarks: https://codereview.appspot.com/103550047/ we can measure the effect of optimizations to see if they are worth adding to the compiler.

This introduces an ssaopt package to hold SSA-level optimizations, to which we add an allocation stack lowering pass based on a simple intraprocedural escape analysis. In an experiment comparing libgo benchmark performance before and after this change, this pass yielded a geo-mean 7.1% (5.5-8.7% at a 95% confidence interval) improvement in performance.

axw commented 10 years ago

Very cool, thanks. I have been looking into escape analysis off-and-on, but have been investigating more complex data-flow analysis based approach (based on Choi99, suggested by adonovan). I'll take some time to go through this later properly.

Do you have any thoughts on how to extend this to interprocedural analysis?

pcc commented 10 years ago

I skimmed that paper. It seems like one key principle they use is graph reachability; the algorithm I implemented is based in some ways on graph reachability, which is of course made relatively simple by the SSA representation.

The analysis as implemented seems to be sufficient in many cases; I looked through its output for one of the larger packages (net) and could not immediately spot any cases where the algorithm failed to lower an alloc (given of course the intraprocedural nature of the analysis).

One relatively simple way I can imagine implementing interprocedural analysis relatively well is:

Analyze functions with a bottom-up traversal of the call graph. Disable interprocedural analysis for strongly connected components of the call graph -- handling those would probably be more tricky.
For each function, we generate a summary, to be used to analyze calls in callers. The summary would consist of one bit of information for each parameter, i.e. whether it escapes.
Store summaries in the export data to support inter-package analysis.

Earlier tonight I thought of a case where the transformation in this pull request is unsound (but because of a bug in go/ssa I don't think it can happen in practice). I should have a new version soon but it might depend on some changes to go/ssa.

axw commented 10 years ago

One relatively simple way I can imagine implementing interprocedural analysis relatively well is:

Analyze functions with a bottom-up traversal of the call graph. Disable interprocedural analysis for strongly connected components of the call graph -- handling those would probably be more tricky. For each function, we generate a summary, to be used to analyze calls in callers. The summary would consist of one bit of information for each parameter, i.e. whether it escapes. Store summaries in the export data to support inter-package analysis.

Sounds good. gc's escape analysis does handle mutually recursive functions, but I think it makes sense to get something simple/limited in place and evolve towards that.

pcc commented 10 years ago

Updated this pull request with my bug fix after re-running benchmarks.

It's unclear why the percentage improvement has decreased a little, especially given that the total number of allocations has decreased from my previous run (as a result of the go/ssa bug fix in https://codereview.appspot.com/110210045/ ). I'll attribute it to a quiescence issue on my workstation.

axw commented 10 years ago

Given that the branch-loop-interprocedural thing is academical at this point, LGTM. Thanks.

go-llvm / llgo

Allocation stack lowering based on escape analysis #197