LilithHafner / Chairmarks.jl

Benchmarks with back support
GNU General Public License v3.0
74 stars 6 forks source link

Suggestion to Follow BenchmarkTools More Closely #100

Open alfaromartino opened 2 months ago

alfaromartino commented 2 months ago

Just a couple of suggestions for this superb blazingly fast package! In summary, I propose following BenchmarkTools more closely regarding layout.

Thanks for your work!

  1. Adding "0 allocations: 0 bytes" when there are no allocations, instead of printing nothing. This follows @btime and is also consistent when there's at least one allocation with @b. Example
    
    7.107 ns (0 allocations: 0 bytes) # this is @btime
    7.300 ns                          # this is @b

but

34.694 ns (1 allocation: 896 bytes) # this is @btime 33.846 ns (1 allocs: 896 bytes) # this is @b

It might seem trivial, but it becomes important when you have several lines of results, you're new to the package, and for teaching. 

2. Add `display` by default as BenchmarkTools to reduce boilerplate code. Right now, to get the same results when you execute a whole code snippet, we need to explicitly add `display` to see the result. Example:
```julia
using BenchmarkTools
using Chairmarks

x = rand(100)
foo1(x) = x^2
foo2(x) = x^3

# this displays all the results directly when the whole code is executed
@btime foo1.($x)
@btime foo2.($x)

# you need to add display to get the same behavior as above
display(@b foo1.($x))
display(@b foo2.($x))
  1. This is also a suggestion, but probably it's too late. Maybe change @b for a more explicit name? I can imagine that reading code with @b without any context calls for confusion.
LilithHafner commented 2 months ago

Thanks for bringing this up! All of these departures are intentional, the rest of this comment is both a response to you and a draft for inclusion in https://chairmarks.lilithhafner.com/stable/explanations for future reference. I'm open to discussion on all of these points.

When there are conflicts between compatibility/alignment with BenchmarkTools and producing the best experience I can for folks who are not coming for BenchmarkTools or using BenchmarkTools simultaneously, I put much more weight on the latter. One reason for this is folks who want something like BenchmarkTools should use BenchmarkTools, it's a great package that is reliable, mature, and has been stable for a long time. A diversity of design choices lets users pick pakcages based on their own preferences. Another reason for this is that I aim to work toward the best long term benchmarking solution possible (perhaps in some years there will come a time where another package makes both BenchmarkTools.jl and Chairmarks.jl obsolete). To this end, carrying forward design choices I disagree with is not beneficial. All that said, I do not want to break compatibility or change style just to stand out. Almost all of BenchmarkTools' design decisions are solid and worth copying. Things like automatic tuning, the ability to bypass that automatic tuning, a split evals/samples structure, the ability to run untimed setup code before each sample, and many more mundane details we take for granted were once clever design decisions made in BenchmarkTools or its predecessors.

Below, I'll list some specific design departures and why I made them

Chairmarks uses the abbreviated macros @b and @be. Descriptive names are almost always better than terse one-letter names. However I maintain that macros defined in packages and designed to be typed repeatedly at the REPL are one of the few exceptions to this "almost always". At the REPL, these macros are often typed once and never read. In this case, concision does matter and readability does not. When naming these macros I did anticipated this usage would be much more common thanin packages or reused scripts. However, if and as this changes it may be worth adding longer names for them and possibly restricting the shorter names to interactive use only.

@be, like BenchmarkTools.@benchmark, returns a Benchmark object. @b, unlike BenchmarkTools.@btime returns a composite sample formed by computing the minimum statistic over the benchmark, rather than returning the expression result and printing runtime statistics. The reason I originally considered making this decision is that I've typed @btime sort!(x) setup=(x=rand(1000)) evals=1 into the REPL and seen the whole screen fill with random numbers one too many times. But let's consider the etymology of @time to justify this decision further. @time is a lovely macro that can be placed around an arbitrary long-running chunk of code or expression to report its runtime to stdout. @time is the print statement of profiling. @btime and @b can very much not fill that role for three major reasons: first, most long-running code has side effects, and those macros run the code repeatedly, which could break things that rely on their side effects; second, @btime, and to a lesser extent @b, take ages to run; and third, only applying to @btime, @btime runs its body in global scope, not the scope of the caller. @btime and @b are not noninvasive tools to measure runtime of a portion of an algorithm, they are top-level macros to measure the runtime of an expression or function call. Their primary result is the runtime statistics of expression under benchmarking, and the conventional way to report the primary result of a macro of function call to the calling context is with a return value. Consequently @b returns an aggregated benchmark result rather than following the pattern of @btime.

If you are writing a script that computes some values and want to display those values to the user, you generally have to call display. Chairmarks in not an exception. If it were possible, I would consider special-casing @show @b blah.

No Julia objects that I know of display with a leading space on the first line. Sample (returned by @b) is no different. See above for why @b returns a Sample instead of displaying in the style of @time.

The fact that @be does not have leading spaces on trailing lines is an oversight.

BenchmarkTools.jl's short display mode (@btime) displays runtime and allocations. Chairmark's short mode display mode (displaying a sample, or simply @b at the REPL) follows Base.@time instead and captures a wide variety of information, displaying only nonzero values. Here's a selection of the diversity of information Charimarks makes available to users

julia> @b 1+1
1.644 ns

julia> @btime 1+1;
  1.083 ns (0 allocations: 0 bytes)

julia> @b rand(10)
110.621 ns (2 allocs: 144 bytes)

julia> @btime rand(10);
  45.037 ns (2 allocations: 144 bytes)

julia> @b rand(10_000_000)
11.445 ms (3 allocs: 76.294 MiB, 5.43% gc time)

julia> @btime rand(10_000_000);
  8.843 ms (3 allocations: 76.29 MiB)

julia> @b @eval 1+1
88.375 μs (51 allocs: 1.828 KiB)

julia> @btime @eval 1+1;
  34.501 μs (47 allocations: 1.72 KiB)

julia> @b @eval begin f(x) = x+1; f(1) end
1.328 ms (863 allocs: 38.766 KiB, 56.93% compile time)

julia> @btime @eval begin f(x) = x+1; f(1) end;
  1.402 ms (859 allocations: 38.66 KiB)

julia> @time @b sleep(1)
  1.269034 seconds (38.04 k allocations: 1.944 MiB, 20.59% compilation time)
1.002 s (4 allocs: 112 bytes, without a warmup)

julia> @time @btime sleep(1)
  1.002 s (4 allocations: 112 bytes)
 16.438018 seconds (27.50 k allocations: 1.341 MiB, 24.21% gc time, 2.16% compilation time)

It would be a loss restrict ourselves to only runtime and allocations, it would be distracting to include "0% compilation time" in all of these outputs, and it would be inconsistent to make some fields (e.g. allocation count and amount) always display while others are only displayed when non-zero. Sparse display is the compromise I've chosen to get the best of both worlds.

alfaromartino commented 2 months ago

Thanks for your quick response! I completely understand there are trade offs, that's why I labelled them as suggestions.

I'd still insist on the importance of adding "0 allocations". If you see a line like

11.445 ms 

could be easily confused with some option that only provides the time, without indicating any other information.

Many thanks for such a great package! It's saving me tons of time in my workflow.

LilithHafner commented 2 months ago

I prefer following @time rather than @btime, in part because I expect folks to figure out pretty quickly that @b tracks allocations (and other things). There is no functionality I know of anywhere in the Julia ecosystem that prints time in that format (with trailing abbreviated units) and does not track allocations.

Nevertheless, I'm going to solicit other opinions on this matter of opinion :)

Tortar commented 2 days ago

FWIW, I initially agreed with 1., 2., and 3. with @alfaromartino, reconsidering after the great response of @LilithHafner I totally dropped 1. and 2., but I actually need to say that @b and @be are really too short and I prefer the verbosity of BenchmarkTools for the clarity it generates (also in my head). The problematic aspect of simultaneous use is unfortunate though. But I think that it should not happen that often that both packages will be used at the same time, after Chairmarks.jl matures.

LilithHafner commented 2 days ago

What sort of longer names do folks want?

A. @btime/@benchmark does not make sense because @b, unlike BencharmkTools.@btime, neither prints runtime data nor returns the evaluation result B. @benchmark_summary/@benchmark_full works, though very verbose, and precludes adding an even more informative version that saves more data C. @bench/@benchmark is fun and silly and concise while still being clear that we're benchmarking, but still relies on the pun: shorter name = shorter results. D. @youaskedforalongnameyougetalongnamehahahaha/@youaskedforalongnameyougetalongnamehahahaha2 is an option :P E. @benchmark_results/@min_bench an AI generated idea F. @full_bench and @quick_bench another AI idea

I prefer B or C


I would implement this by adding long names as an option and then maybe deprecating the use of short names in scripts. I do not plan to deprecate the use of short names at the REPL.

Tortar commented 2 days ago

I prefer the human generated C. and I also like the pun ;)

alfaromartino commented 2 days ago

Didn't know that specific names/macros could be exclusive to REPL. Keeping @b for the REPL would be great, as the names proposed would be used only for sharing code.

In that case, I think that F. would be a good option (or @full_bench and @brief_bench).

Some other suggestions with the help of AI would @full_bench along with @speedmark @timetest @marktime @timetrial

In any case, I think C or F are fine.