Open matthias314 opened 8 months ago
Very interesting. Thank you for reporting even though it is hard to reproduce.
My initial guess is that this is that this is due to a random slowdown causing the autotuner to pick a significantly too low number of evaluations, which results in a longer reported runtime. This is something that both Chairmarks and BenchmarkTools are susceptible to, but Chairmarks more so due to less time spent tuning.
One possible solution is to verify that the tuning was reasonable using the final estimated runtime, and retry if not.
However, after a fair bit of experimentation, the closest I could get to reproducing this is
julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
1.916 ns (0 allocations: 0 bytes)
Benchmark: 56 samples with 8260 evaluations
min 1.993 ns
median 2.003 ns
mean 2.007 ns
max 2.406 ns
julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
2.041 ns (0 allocations: 0 bytes)
Benchmark: 4 samples with 7769 evaluations
2.183 ns
2.183 ns
2.188 ns
2.188 ns
julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
1.958 ns (0 allocations: 0 bytes)
Benchmark: 54 samples with 8256 evaluations
min 1.993 ns
median 1.999 ns
mean 2.006 ns
max 2.387 ns
julia> v = zero(SVector{1,Int16}); @btime $v == $v; @be $v == $v seconds=.003
1.958 ns (0 allocations: 0 bytes)
Benchmark: 33 samples with 7773 evaluations
min 2.182 ns
median 2.187 ns
mean 2.197 ns
max 2.546 ns
Which is totally inconsistent with that hypothesis. What do you get when you change @b
to @be
?
This seems suspicious and related:
julia> for _ in 1:10
v = zero(SVector{1,Int16});
x = @be $v == $v seconds=.003
println(sum(s -> s.time*s.evals, x.samples) / .003) # Efficiency
end
0.005333333333333334
0.6315463333333339
0.9134653333333339
0.8869929999999999
0.8970749999999997
0.8211156666666665
0.8458399999999997
0.9160763333333332
0.7899360000000001
0.959712333333333
Perhaps it is because compile time is counted against the runtime budget resulting in significantly fewer samples being taken
What do you get when you change
@b
to@be
?julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 17.638 ns (0 allocations: 0 bytes) Benchmark: 2699 samples with 1725 evaluations min 15.839 ns median 16.171 ns mean 16.095 ns max 27.762 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.525 ns (0 allocations: 0 bytes) Benchmark: 3421 samples with 2721 evaluations min 9.763 ns median 9.771 ns mean 9.879 ns max 22.681 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.514 ns (0 allocations: 0 bytes) Benchmark: 3201 samples with 1564 evaluations min 16.551 ns median 18.191 ns mean 18.113 ns max 32.007 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.521 ns (0 allocations: 0 bytes) Benchmark: 3410 samples with 2709 evaluations min 9.766 ns median 9.778 ns mean 9.930 ns max 23.148 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v 9.515 ns (0 allocations: 0 bytes) Benchmark: 3267 samples with 1561 evaluations min 17.669 ns median 17.701 ns mean 17.861 ns max 29.844 ns
Note that for `@btime` I again get the larger value once at the beginning.
Very strange. This is inconsistent with both the compile time idea and the tuning idea. To confirm that it is not tuning-related, what if you add keywords
v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
what if you add keywords
v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
Now I almost always get the larger value for @be
-- and an oscillating behaviour for @btime
!
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
9.518 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 9.766 ns
median 10.036 ns
mean 10.170 ns
max 43.306 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
10.576 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 15.828 ns
median 16.102 ns
mean 16.322 ns
max 45.574 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
17.644 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 15.829 ns
median 16.258 ns
mean 16.347 ns
max 49.786 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
10.573 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 15.829 ns
median 16.262 ns
mean 16.319 ns
max 49.806 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
17.570 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 15.828 ns
median 16.266 ns
mean 16.718 ns
max 43.635 ns
julia> v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421
10.572 ns (0 allocations: 0 bytes)
Benchmark: 3420 samples with 2721 evaluations
min 15.829 ns
median 16.019 ns
mean 16.339 ns
max 29.027 ns
wtf. Benchmarking is hard. I have no idea.
What if you put it in a function?
f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end
and then f()
repeatedly.
What if you put it in a function?
julia> f() = begin v = zero(SVector{14,Int16}); @btime $v == $v; @be $v == $v evals=2721 samples=3421 end f (generic function with 1 method)
julia> f() 9.516 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.675 ns median 17.692 ns mean 18.746 ns max 53.658 ns
julia> f() 9.778 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.666 ns median 17.688 ns mean 18.915 ns max 52.516 ns
julia> f() 10.578 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.663 ns median 17.688 ns mean 18.602 ns max 46.182 ns
julia> f() 9.515 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.689 ns mean 18.355 ns max 46.739 ns
julia> f() 16.661 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.688 ns mean 18.145 ns max 41.606 ns
julia> f() 9.516 ns (0 allocations: 0 bytes) Benchmark: 3420 samples with 2721 evaluations min 17.674 ns median 17.691 ns mean 18.206 ns max 38.266 ns
Maybe this cannot be debugged ...
I wonder if @jrevels has any ideas, seeing how this effects both BenchmarkTools and Chairmarks.
If you take out the call to BenchmarkTools are the Chairmark results still slow?
g() = begin v = zero(SVector{14,Int16}); @be $v == $v evals=2721 samples=3421 end
And then g()
repeatedly
f you take out the call to BenchmarkTools are the Chairmark results still slow?
Yes, still slow with g()
, both within the same Julia session as before and a fresh one.
I wish I could reproduce this on my machine, that would make debugging easier. I've tried unsuccessfully on
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: 8 × unknown
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, generic)
Threads: 1 on 8 virtual cores
julia> versioninfo()
Julia Version 1.11.0-alpha1
Commit 671de9f5793 (2024-03-01 08:30 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (aarch64-linux-gnu)
CPU: 8 × unknown
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_EDITOR = code
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × Intel(R) Xeon(R) CPU E5520 @ 2.27GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, nehalem)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M2
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores
Environment:
JULIA_EDITOR = code
Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: 4 × Apple Silicon
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, k8-sse3)
Threads: 1 on 4 virtual cores
In this example, the timings returned by
@b
oscillate between two values, one of which agrees with@btime
:I only get this on my laptop, not on another machine.
versioninfo()
for my laptop:Packages:
Addition: For
@btime
I once got the higher number when I ran the benchmark for the first time:but I cannot reproduce this.