add basic benchmarks for Julia-level compilation pipeline

aviatesk commented 2 years ago

This commit setups a basic infrastructure for benchmarking Julia-level compilation pipeline. InferenceBenchmarks is based on InferenceBenchmarker <: AbstractInterpreter, which maintains its own global inference cache, and so it allows us to run the compilation pipeline multiple times while avoiding caches generated by previous compilation to be reused.

I set up a top-level benchmark group named "inference": InferenceBenchmarks, which is composed of the following subgroups:

"inference": just benchmarks overall Julia-level compilation pipeline
"abstract interpretation": benchmarks only abstract interpretation, i.e. without optimization
"optimization": benchmarks only optimization

Here is an example of benchmark result obtained by comparing these two commits of JuliaLang/julia 5c357e9 and d515f05:

# built on 5c357e9
using BenchmarkTools, BaseBenchmarks
BaseBenchmarks.load!("inference")
results = run(BaseBenchmarks.SUITE; verbose = true)
BenchmarkTools.save("5c357e9.json", results)

# built on d515f05
using BenchmarkTools, BaseBenchmarks
BaseBenchmarks.load!("inference")
results = run(BaseBenchmarks.SUITE; verbose = true)
BenchmarkTools.save("d515f05.json", results)

# compare
using BenchmarkTools, BaseBenchmarks
base = BenchmarkTools.load("5c357e9.json")[1]
target = BenchmarkTools.load("d515f05.json")[1]

julia> leaves(regressions(judge(minimum(target), minimum(base))))
Any[]

julia> leaves(improvements(judge(minimum(target), minimum(base))))
6-element Vector{Any}:
 (Any["inference", "inference", "rand(Float64)"], TrialJudgement(-2.85% => invariant))
 (Any["inference", "inference", "sin(42)"], TrialJudgement(-2.44% => invariant))
 (Any["inference", "inference", "abstract_call_gf_by_type"], TrialJudgement(-1.97% => invariant))
 (Any["inference", "inference", "println(::QuoteNode)"], TrialJudgement(-0.96% => invariant))
 (Any["inference", "optimization", "sin(42)"], TrialJudgement(+1.26% => invariant))
 (Any["inference", "optimization", "println(::QuoteNode)"], TrialJudgement(-6.97% => improvement))

This result is very satisfying because the refactor added in d515f05 certainly improved Julia-level compilation performance by avoiding domtree construction in the SROA pass in many cases.

aviatesk commented 2 years ago

The failure in Julia nightly is because this added benchmark suite isn't tuned yet, and thus it's tuned like evals=2: https://github.com/JuliaCI/BaseBenchmarks.jl/runs/4379154336?check_suite_focus=true#step:5:7094 This disables setup settings and this causes the failure.

I confirmed this benchmark suite works just correctly on my machine.

vtjnash commented 2 years ago

I think you need to specify evals=1 to @benchmarkable

aviatesk commented 2 years ago

Even though I set it manually here? https://github.com/JuliaCI/BaseBenchmarks.jl/blob/d3861653b3f88125aa5db5506650a352f0b1dece/src/inference/InferenceBenchmarks.jl#L173

vtjnash commented 2 years ago

That will work, assuming no other code later calls tune

aviatesk commented 2 years ago

Ah, evals = 2 is specified for our test case: https://github.com/JuliaCI/BaseBenchmarks.jl/blob/02548823de3a56da5ed9e5d79fef845c2f16d93b/test/runtests.jl#L10-L14

JuliaCI / BaseBenchmarks.jl

add basic benchmarks for Julia-level compilation pipeline #288