benchmark results oscillate

matthias314 commented 8 months ago

In this example, the timings returned by @b oscillate between two values, one of which agrees with @btime:

julia> using StaticArrays, BenchmarkTools, Chairmarks

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.522 ns (0 allocations: 0 bytes)
16.110 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.510 ns (0 allocations: 0 bytes)
9.763 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.515 ns (0 allocations: 0 bytes)
16.536 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  9.517 ns (0 allocations: 0 bytes)
9.764 ns

I only get this on my laptop, not on another machine. versioninfo() for my laptop:

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

Packages:

Status `/tmp/jl_AUb4oc/Project.toml`
  [6e4b80f9] BenchmarkTools v1.5.0
  [0ca39b1e] Chairmarks v1.1.2
  [90137ffa] StaticArrays v1.9.3

Addition: For @btime I once got the higher number when I ran the benchmark for the first time:

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @b $v == $v
  16.658 ns (0 allocations: 0 bytes)
17.701 ns

but I cannot reproduce this.

LilithHafner commented 8 months ago

Very interesting. Thank you for reporting even though it is hard to reproduce.

My initial guess is that this is that this is due to a random slowdown causing the autotuner to pick a significantly too low number of evaluations, which results in a longer reported runtime. This is something that both Chairmarks and BenchmarkTools are susceptible to, but Chairmarks more so due to less time spent tuning.

One possible solution is to verify that the tuning was reasonable using the final estimated runtime, and retry if not.

However, after a fair bit of experimentation, the closest I could get to reproducing this is

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.916 ns (0 allocations: 0 bytes)
Benchmark: 56 samples with 8260 evaluations
min    1.993 ns
median 2.003 ns
mean   2.007 ns
max    2.406 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  2.041 ns (0 allocations: 0 bytes)
Benchmark: 4 samples with 7769 evaluations
       2.183 ns
       2.183 ns
       2.188 ns
       2.188 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.958 ns (0 allocations: 0 bytes)
Benchmark: 54 samples with 8256 evaluations
min    1.993 ns
median 1.999 ns
mean   2.006 ns
max    2.387 ns

julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
  1.958 ns (0 allocations: 0 bytes)
Benchmark: 33 samples with 7773 evaluations
min    2.182 ns
median 2.187 ns
mean   2.197 ns
max    2.546 ns

Which is totally inconsistent with that hypothesis. What do you get when you change @b to @be?

LilithHafner commented 8 months ago

This seems suspicious and related:

julia> for _ in 1:10
           v = zero(SVector{1,Int16});
           x = @be $v == $v seconds=.003
           println(sum(s -> s.time*s.evals, x.samples) / .003) # Efficiency
       end
0.005333333333333334
0.6315463333333339
0.9134653333333339
0.8869929999999999
0.8970749999999997
0.8211156666666665
0.8458399999999997
0.9160763333333332
0.7899360000000001
0.959712333333333

Perhaps it is because compile time is counted against the runtime budget resulting in significantly fewer samples being taken

matthias314 commented 8 months ago

What do you get when you change @b to @be?


julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v
17.638 ns (0 allocations: 0 bytes)
Benchmark: 2699 samples with 1725 evaluations
min    15.839 ns
median 16.171 ns
mean   16.095 ns
max    27.762 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.525 ns (0 allocations: 0 bytes) Benchmark: 3421 samples with 2721 evaluations min 9.763 ns median 9.771 ns mean 9.879 ns max 22.681 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.514 ns (0 allocations: 0 bytes) Benchmark: 3201 samples with 1564 evaluations min 16.551 ns median 18.191 ns mean 18.113 ns max 32.007 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.521 ns (0 allocations: 0 bytes) Benchmark: 3410 samples with 2709 evaluations min 9.766 ns median 9.778 ns mean 9.930 ns max 23.148 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.515 ns (0 allocations: 0 bytes) Benchmark: 3267 samples with 1561 evaluations min 17.669 ns median 17.701 ns mean 17.861 ns max 29.844 ns


Note that for `@btime` I again get the larger value once at the beginning.

LilithHafner commented 8 months ago

Very strange. This is inconsistent with both the compile time idea and the tuning idea. To confirm that it is not tuning-related, what if you add keywords

v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421

matthias314 commented 8 months ago

what if you add keywords

v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421

Now I almost always get the larger value for @be -- and an oscillating behaviour for @btime!

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  9.518 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    9.766 ns
median 10.036 ns
mean   10.170 ns
max    43.306 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.576 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.828 ns
median 16.102 ns
mean   16.322 ns
max    45.574 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  17.644 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.258 ns
mean   16.347 ns
max    49.786 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.573 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.262 ns
mean   16.319 ns
max    49.806 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  17.570 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.828 ns
median 16.266 ns
mean   16.718 ns
max    43.635 ns

julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
  10.572 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min    15.829 ns
median 16.019 ns
mean   16.339 ns
max    29.027 ns

LilithHafner commented 8 months ago

wtf. Benchmarking is hard. I have no idea.

What if you put it in a function?

f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end

and then f() repeatedly.

matthias314 commented 8 months ago

What if you put it in a function?


julia> f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end
f (generic function with 1 method)

julia> f() 9.516 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.675 ns median 17.692 ns mean 18.746 ns max 53.658 ns

julia> f() 9.778 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.666 ns median 17.688 ns mean 18.915 ns max 52.516 ns

julia> f() 10.578 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.663 ns median 17.688 ns mean 18.602 ns max 46.182 ns

julia> f() 9.515 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.689 ns mean 18.355 ns max 46.739 ns

julia> f() 16.661 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.688 ns mean 18.145 ns max 41.606 ns

julia> f() 9.516 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.691 ns mean 18.206 ns max 38.266 ns


Maybe this cannot be debugged ...

LilithHafner commented 8 months ago

I wonder if @jrevels has any ideas, seeing how this effects both BenchmarkTools and Chairmarks.

LilithHafner commented 8 months ago

If you take out the call to BenchmarkTools are the Chairmark results still slow?

g() = begin v = zero(SVector{14,Int16}); @be $v == $v evals=2721 samples=3421 end

And then g() repeatedly

matthias314 commented 8 months ago

f you take out the call to BenchmarkTools are the Chairmark results still slow?

Yes, still slow with g(), both within the same Julia session as before and a fresh one.

LilithHafner commented 8 months ago

I wish I could reproduce this on my machine, that would make debugging easier. I've tried unsuccessfully on

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, generic)
  Threads: 1 on 8 virtual cores

julia> versioninfo()
Julia Version 1.11.0-alpha1
Commit 671de9f5793 (2024-03-01 08:30 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = code

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, nehalem)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 4 virtual cores
Environment:
  JULIA_EDITOR = code

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 4 × Apple Silicon
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, k8-sse3)
  Threads: 1 on 4 virtual cores

LilithHafner / Chairmarks.jl

benchmark results oscillate #79