PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks

asinghvi17 commented 7 months ago

Simply replacing @benchmarkable with @be suffices, and you don't have to run tune! or run either!

Even running Statistics.median(suite) works - although any custom plotting utilities might need a couple of tweaks :)

LilithHafner commented 7 months ago

What? I had not idea. This is lovely :)

MilesCranmer commented 7 months ago

Is there a way to take a BenchmarkGroup and "translate" it somehow? Or otherwise run a benchmark suite using Chairmarks instead of BenchmarkTools? Many projects have historically defined their benchmark suite in benchmarks/benchmarks.jl and use BenchmarkTools. So to measure performance over time it is perhaps not practical to replace all previous @benchmarkable (especially if there are setup=... in the macro).

I ask specifically in the context of AirspeedVelocity.jl: https://github.com/MilesCranmer/AirspeedVelocity.jl/issues/35

asinghvi17 commented 7 months ago

I mean they both have the same syntax, so an operation at all leaf nodes of that dict could easily reassign the @benchmarkables to the results of @be on that same Expr (if you can directly invoke a macro on an Expr).

MilesCranmer commented 7 months ago

I think they have slightly different syntax, no?

  Positional argument disambiguation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  setup, teardown, and init are optional and are parsed with that precedence giving these possible forms:

  @be f
  @be setup f
  @be setup f teardown
  @be init setup f teardown

whereas @benchmarkable is

help?> @benchmarkable
  @benchmarkable <expr to benchmark> [setup=<setup expr>]

  Create a Benchmark instance for the given expression. @benchmarkable has similar syntax with @benchmark. See also @benchmark.

asinghvi17 commented 7 months ago

Ah that's true - I had not used setup much but I guess it could also be translated by way of Expr rewriting

MilesCranmer commented 7 months ago

Wait, I'm a bit confused. @benchmarkable returns a benchmark (that you can then execute). Whereas @be appears to actually return the results of benchmarking the expression. Is that correct?

asinghvi17 commented 7 months ago

Yep! Creating a benchmark suite with @be instead of @benchmarkable gives you the equivalent of that suite after run.

MilesCranmer commented 7 months ago

Oh, but isn't the whole point of @benchmarkable for it to be lazily evaluated, so you can tune! it? And @benchmark for eager evaluation?

asinghvi17 commented 7 months ago

That's true, but if running in a non-interactive framework I don't think it really matters?

Even then, the performance difference is substantial enough that it's actually possible to do semi-interactive workflows with Chairmarks.

MilesCranmer commented 7 months ago

I guess I just wouldn't consider @benchmarkable -> @be to be a complete solution (maybe this deserves a new issue with a feature request). For example if I need a single benchmark result I would use @benchmark. But I want a suite of benchmarks that I can store in my REPL, in between Revise.jl-ing my library, then the suite is something I would want to re-run. In principle it doesn't seem too bad to add a compatibility layer?

This is the benchmarkable code. The logic isn't too complicated it seems, it just generates a function `samplefunc` that returns the recorded time, allocations, memory for a particular expression.

```julia """ @benchmarkable [setup=] Create a `Benchmark` instance for the given expression. `@benchmarkable` has similar syntax with `@benchmark`. See also [`@benchmark`](@ref). """ macro benchmarkable(args...) core, setup, teardown, quote_vars, quote_vals, params = benchmarkable_parts(args) map!(esc, params, params) # extract any variable bindings shared between the core and setup expressions setup_vars = isa(setup, Expr) ? collectvars(setup) : [] core_vars = isa(core, Expr) ? collectvars(core) : [] out_vars = filter(var -> var in setup_vars, core_vars) # generate the benchmark definition return quote generate_benchmark_definition( $__module__, $(Expr(:quote, out_vars)), $(Expr(:quote, setup_vars)), $(Expr(:quote, quote_vars)), $(esc(Expr(:tuple, Expr.(:quote, quote_vals)...))), $(esc(Expr(:quote, core))), $(esc(Expr(:quote, setup))), $(esc(Expr(:quote, teardown))), Parameters($(params...)), ) end end # `eval` an expression that forcibly defines the specified benchmark at # top-level in order to allow transfer of locally-scoped variables into # benchmark scope. # # The double-underscore-prefixed variable names are not particularly hygienic - it's # possible for them to conflict with names used in the setup or teardown expressions. # A more robust solution would be preferable. function generate_benchmark_definition( eval_module, out_vars, setup_vars, quote_vars, quote_vals, core, setup, teardown, params ) @nospecialize corefunc = gensym("core") samplefunc = gensym("sample") type_vars = [gensym() for i in 1:(length(quote_vars) + length(setup_vars))] signature = Expr(:call, corefunc, quote_vars..., setup_vars...) signature_def = Expr( :where, Expr( :call, corefunc, [ Expr(:(::), var, type) for (var, type) in zip([quote_vars; setup_vars], type_vars) ]..., ), type_vars..., ) if length(out_vars) == 0 invocation = signature core_body = core elseif length(out_vars) == 1 returns = :(return $(out_vars[1])) invocation = :($(out_vars[1]) = $(signature)) core_body = :($(core); $(returns)) else returns = :(return $(Expr(:tuple, out_vars...))) invocation = :($(Expr(:tuple, out_vars...)) = $(signature)) core_body = :($(core); $(returns)) end @static if isdefined(Base, :donotdelete) invocation = :( let x = $invocation Base.donotdelete(x) x end ) end return Core.eval( eval_module, quote @noinline $(signature_def) = begin $(core_body) end @noinline function $(samplefunc)( $(Expr(:tuple, quote_vars...)), __params::$BenchmarkTools.Parameters ) $(setup) __evals = __params.evals __gc_start = Base.gc_num() __start_time = time_ns() __return_val = $(invocation) for __iter in 2:__evals $(invocation) end __sample_time = time_ns() - __start_time __gcdiff = Base.GC_Diff(Base.gc_num(), __gc_start) $(teardown) __time = max((__sample_time / __evals) - __params.overhead, 0.001) __gctime = max((__gcdiff.total_time / __evals) - __params.overhead, 0.0) __memory = Int(Base.fld(__gcdiff.allocd, __evals)) __allocs = Int( Base.fld( __gcdiff.malloc + __gcdiff.realloc + __gcdiff.poolalloc + __gcdiff.bigalloc, __evals, ), ) return __time, __gctime, __memory, __allocs, __return_val end $BenchmarkTools.Benchmark($(samplefunc), $(quote_vals), $(params)) end, ) en ```

So perhaps a ext/ChairmarksBenchmarkToolsExt.jl could create a @benchmarkable that still uses benchmarkable_parts to extract the pieces, but uses Chairmarks instead for running the thing? I'm not sure how doable this is. Maybe @LilithHafner could share their thoughts.

asinghvi17 commented 7 months ago

Hmm! Yeah that logic could probably be directly translated to Chairmarks somehow, probably by creating a new macro which stores an Expr object when run (the equivalent of @benchmarkable). It doesn't look like this is hijackable unless either we do major type piracy by overloading BenchmarkTools.generate_benchmark_definition directly, or offer a function Chairmarks.override_benchmarkable!() to @eval that code in.

LilithHafner / Chairmarks.jl

PSA: It is possible to use `BenchmarkTools.BenchmarkGroup` with Chairmarks #70