LilithHafner / Chairmarks.jl

Benchmarks with back support
GNU General Public License v3.0
81 stars 8 forks source link

Does not suppress allocation & GC time as much as BenchmarkTools #78

Closed XLin0mu closed 6 months ago

XLin0mu commented 6 months ago

Chairmarks.jl can't get the best performance before excuting the methods from BenchmarkTools. See my test:

julia> using Chairmarks

julia> @b rand(100, 10000, 100)
178.522 ms (2 allocs: 762.940 MiB, 24.89% gc time)

julia> @b rand(100, 10000, 100)
174.507 ms (2 allocs: 762.940 MiB, 24.95% gc time)

julia> using BenchmarkTools

julia> @b rand(100, 10000, 100)
180.840 ms (2 allocs: 762.940 MiB, 24.67% gc time)

julia> @b rand(100, 10000, 100)
172.184 ms (2 allocs: 762.940 MiB, 24.11% gc time)

julia> @btime rand(100, 10000, 100);
  123.355 ms (2 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100)
126.622 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100)
125.907 ms (2 allocs: 762.940 MiB)

Most strangly, It seems like happenning randomly. Sometime this trick won't work, too. Also, the performance fluctuating randomly even not doing anything, sometime can reach 200% times comparing with base time given by best-performanced @b and @btime. A manual way for gc configuration may help?

LilithHafner commented 6 months ago

You're benchmarking the runtime of a noisy function that takes >100ms with the default benchmarking runtime budget of 100ms. Consequently, all Charimarks can do while respecting your runtime budget is a single noisy evaluation. When you see strange results, it can often be helpful to run @be instead of @b to see what's going on. On a quiet system with low memory pressure, you may get results like this:

x@fedora:~$ julia
  o  | Version 1.11.0-alpha1 (2024-03-01)
 o o | Official https://julialang.org/ release
julia> using Chairmarks

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       88.744 ms (3 allocs: 762.940 MiB, 4.49% gc time)
       90.265 ms (3 allocs: 762.940 MiB, 6.25% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       89.377 ms (3 allocs: 762.940 MiB, 4.55% gc time)
       89.461 ms (3 allocs: 762.940 MiB, 5.54% gc time)

julia> using BenchmarkTools

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       91.414 ms (3 allocs: 762.940 MiB, 4.48% gc time)
       92.910 ms (3 allocs: 762.940 MiB, 6.91% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       90.399 ms (3 allocs: 762.940 MiB, 4.45% gc time)
       90.649 ms (3 allocs: 762.940 MiB, 4.60% gc time)

julia> @btime rand(100, 10000, 100);
  89.504 ms (3 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       91.491 ms (3 allocs: 762.940 MiB, 4.79% gc time)
       95.220 ms (3 allocs: 762.940 MiB, 5.61% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       91.431 ms (3 allocs: 762.940 MiB, 4.82% gc time)
       91.563 ms (3 allocs: 762.940 MiB, 4.73% gc time)

But if you see an unacceptable level of variability for your usecase and only a single sample, you should increase the runtime budget.

A manual way for gc configuration may help?

The current syntax for triggering garbage collection before benchmarking is GC.gc(); @b ..., but it doesn't seem to change anything in your case (at least not on my machine). If anything, it adds a wee bit more noise.

julia> using Chairmarks

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       90.845 ms (2 allocs: 762.940 MiB, 5.37% gc time)
       98.337 ms (2 allocs: 762.940 MiB, 10.28% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       90.597 ms (2 allocs: 762.940 MiB, 4.96% gc time)
       91.979 ms (2 allocs: 762.940 MiB, 5.56% gc time)

julia> using BenchmarkTools

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       91.146 ms (2 allocs: 762.940 MiB, 5.64% gc time)
       98.766 ms (2 allocs: 762.940 MiB, 10.12% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       91.029 ms (2 allocs: 762.940 MiB, 5.59% gc time)
       91.210 ms (2 allocs: 762.940 MiB, 4.92% gc time)

julia> @btime rand(100, 10000, 100);
  86.141 ms (2 allocations: 762.94 MiB)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       86.430 ms (2 allocs: 762.940 MiB)
       91.915 ms (2 allocs: 762.940 MiB, 5.50% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 2 samples with 1 evaluation
       92.999 ms (2 allocs: 762.940 MiB, 5.36% gc time)
       101.280 ms (2 allocs: 762.940 MiB, 10.08% gc time)

Do you see an improvement when running GC.gc()?

XLin0mu commented 6 months ago

No, none of them can make any different. This result is given by julia-1.10.0:

julia> using Chairmarks

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       152.922 ms (2 allocs: 762.940 MiB, 14.45% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       154.498 ms (2 allocs: 762.940 MiB, 15.21% gc time)

julia>  GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       150.255 ms (2 allocs: 762.940 MiB, 19.51% gc time)

julia>  GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       152.417 ms (2 allocs: 762.940 MiB, 15.39% gc time)

julia> using BenchmarkTools

julia> @btime rand(100, 10000, 100);
  118.632 ms (2 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       122.694 ms (2 allocs: 762.940 MiB)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       122.843 ms (2 allocs: 762.940 MiB)

And then I tried 1.11.0-a1:

julia> using Chairmarks

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       153.587 ms (3 allocs: 762.940 MiB, 15.18% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       156.794 ms (3 allocs: 762.940 MiB, 15.73% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       164.607 ms (3 allocs: 762.940 MiB, 13.89% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       154.849 ms (3 allocs: 762.940 MiB, 14.72% gc time)

julia> using BenchmarkTools

julia> @btime rand(100, 10000, 100);
  124.102 ms (3 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       154.758 ms (3 allocs: 762.940 MiB, 16.03% gc time)

julia> @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       143.493 ms (3 allocs: 762.940 MiB, 13.76% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       158.053 ms (3 allocs: 762.940 MiB, 14.22% gc time)

julia> GC.gc(); @be rand(100, 10000, 100)
Benchmark: 1 sample with 1 evaluation
       155.258 ms (3 allocs: 762.940 MiB, 14.95% gc time)

Sadly 1.11.0-a1 can't help, too. It even make this trick lose efficacy. It seems like this bug is strong correlation with machine, maybe we need more sample to locate the source... BTW my platform is Win10 22h2 with i7-13700F RAM 48GB, this bug also occured on my another similar laptop which had 32GB RAM and i7-12700H. (is that caused by system or RAM?)

LilithHafner commented 6 months ago

You're benchmarking a function with a variable runtime that depends on wheather or not GC runs. BenchmarkTools manages to exclude GC time from that benchmark result on your machine. Reliably suppressing GC is not within the scope of Chairmarks at this time. I had hoped GC.gc() would do it, but unfortunately GC is controlled by a large mass of heuristics that are subject to change at any time, and I don't want to touch them or make promises about them.

However, getting reliable measurements in the presence of GC is within scope, provided Chairmarks has an adequate runtime budget. Does setting seconds=5 give you reliable results? This is what I get:

julia> @benchmark rand(100, 10000, 100)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  88.883 ms … 145.906 ms  ┊ GC (min … max): 0.34% … 39.48%
 Time  (median):     90.349 ms               ┊ GC (median):    4.06%
 Time  (mean ± σ):   91.925 ms ±   7.670 ms  ┊ GC (mean ± σ):  5.14% ±  4.95%

     ▁▁█▁  ▁▁                                                   
  ▃▁▅████▆▆███▁▁▃▁▁▁▁▁▁▁▅▁▁▁▃▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  88.9 ms         Histogram: frequency by time          102 ms <

 Memory estimate: 762.94 MiB, allocs estimate: 3.

julia> @benchmark rand(100, 10000, 100)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  88.405 ms … 146.013 ms  ┊ GC (min … max): 0.29% … 39.51%
 Time  (median):     90.681 ms               ┊ GC (median):    4.05%
 Time  (mean ± σ):   91.978 ms ±   7.616 ms  ┊ GC (mean ± σ):  5.15% ±  4.95%

         █ ▃                                                    
  ▃▁▁▁▁▃▆█▇██▇▆▆▆▆▁▆▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▁
  88.4 ms         Histogram: frequency by time          102 ms <

 Memory estimate: 762.94 MiB, allocs estimate: 3.

julia> @benchmark rand(100, 10000, 100)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  88.008 ms … 145.095 ms  ┊ GC (min … max): 0.29% … 39.38%
 Time  (median):     91.355 ms               ┊ GC (median):    4.01%
 Time  (mean ± σ):   92.522 ms ±   7.450 ms  ┊ GC (mean ± σ):  5.10% ±  4.94%

          ▃█▁ ▆ ▃▆▃  ▁▁                                         
  ▄▁▁▁▁▁▁▄███▇█▁███▁▇██▇▇▄▄▁▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  88 ms           Histogram: frequency by time          101 ms <

 Memory estimate: 762.94 MiB, allocs estimate: 3.

julia> @be rand(100, 10000, 100) seconds=5
Benchmark: 56 samples with 1 evaluation
min    88.742 ms (3 allocs: 762.940 MiB, 3.86% gc time)
median 89.156 ms (3 allocs: 762.940 MiB, 3.96% gc time)
mean   89.462 ms (3 allocs: 762.940 MiB, 4.03% gc time)
max    92.473 ms (3 allocs: 762.940 MiB, 6.22% gc time)

julia> @be rand(100, 10000, 100) seconds=5
Benchmark: 56 samples with 1 evaluation
min    88.867 ms (3 allocs: 762.940 MiB, 3.89% gc time)
median 90.241 ms (3 allocs: 762.940 MiB, 3.94% gc time)
mean   90.214 ms (3 allocs: 762.940 MiB, 4.00% gc time)
max    92.214 ms (3 allocs: 762.940 MiB, 5.68% gc time)

julia> @be rand(100, 10000, 100) seconds=5
Benchmark: 55 samples with 1 evaluation
min    88.990 ms (3 allocs: 762.940 MiB, 3.84% gc time)
median 91.496 ms (3 allocs: 762.940 MiB, 4.00% gc time)
mean   91.406 ms (3 allocs: 762.940 MiB, 4.08% gc time)
max    93.989 ms (3 allocs: 762.940 MiB, 5.74% gc time)
XLin0mu commented 6 months ago

same... Actually the most confusing question for me is why BenchmarkTools' @btime can effect on Chairmarks, and it only worked on 1.10.0 ,not 1.11.0-a1. Here's the result from 1.10.0:

julia> using Chairmarks

julia> @be rand(100, 10000, 100) seconds=15
Benchmark: 97 samples with 1 evaluation
min    151.312 ms (2 allocs: 762.940 MiB, 7.91% gc time)
median 155.250 ms (2 allocs: 762.940 MiB, 8.28% gc time)
mean   155.883 ms (2 allocs: 762.940 MiB, 8.35% gc time)
max    161.821 ms (2 allocs: 762.940 MiB, 13.66% gc time)

julia> using BenchmarkTools

julia> @btime rand(100, 10000, 100);
  117.606 ms (2 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100) seconds=15
Benchmark: 90 samples with 1 evaluation
min    117.882 ms (2 allocs: 762.940 MiB)
median 159.822 ms (2 allocs: 762.940 MiB)
mean   158.489 ms (2 allocs: 762.940 MiB, 7.16% gc time)
max    284.773 ms (2 allocs: 762.940 MiB, 16.94% gc time)

(the min time cost changed permanently after execute @btime) and 1.11.0-a1:

julia> using Chairmarks

julia> @be rand(100, 10000, 100) seconds=15
Benchmark: 93 samples with 1 evaluation
min    140.112 ms (3 allocs: 762.940 MiB, 7.58% gc time)
median 152.847 ms (3 allocs: 762.940 MiB, 8.48% gc time)
mean   153.241 ms (3 allocs: 762.940 MiB, 8.59% gc time)
max    169.418 ms (3 allocs: 762.940 MiB, 17.33% gc time)

julia> using BenchmarkTools

julia> @btime rand(100, 10000, 100);
  116.821 ms (3 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100) seconds=15
Benchmark: 98 samples with 1 evaluation
min    142.762 ms (3 allocs: 762.940 MiB, 7.67% gc time)
median 152.164 ms (3 allocs: 762.940 MiB, 8.51% gc time)
mean   153.537 ms (3 allocs: 762.940 MiB, 8.50% gc time)
max    170.064 ms (3 allocs: 762.940 MiB, 9.28% gc time)

Quite a strange thing, in a magic way.

LilithHafner commented 6 months ago

I can reproduce strange and unpredictable GC behavior on 1.10.0 and 1.10.2 once I increase the memory pressure on my system. Pardon the hard to read logs. I still don't know exactly how to reproduce this so I'm copying exact logs when I succeed.

julia> using Chairmarks

julia> @be rand(100, 10000, 100) seconds=1
Benchmark: 1 sample with 1 evaluation
       663.473 ms (2 allocs: 762.940 MiB, 72.74% gc time)

julia> @be rand(100, 10000, 100) seconds=1
Benchmark: 2 samples with 1 evaluation
       750.043 ms (2 allocs: 762.940 MiB, 78.03% gc time)
       1.275 s (2 allocs: 762.940 MiB, 88.27% gc time)

julia> for _ in 1:30
           rand(100, 10000, 100)
       end

julia> @be rand(100, 10000, 100) seconds=1
Benchmark: 4 samples with 1 evaluation
       97.464 ms (2 allocs: 762.940 MiB, 4.30% gc time)
       100.342 ms (2 allocs: 762.940 MiB, 9.47% gc time)
       322.246 ms (2 allocs: 762.940 MiB)
       357.622 ms (2 allocs: 762.940 MiB, 70.14% gc time)

julia> @be rand(100, 10000, 100) seconds=1
Benchmark: 10 samples with 1 evaluation
min    89.993 ms (2 allocs: 762.940 MiB, 3.97% gc time)
median 95.832 ms (2 allocs: 762.940 MiB, 5.44% gc time)
mean   99.304 ms (2 allocs: 762.940 MiB, 6.43% gc time)
max    124.403 ms (2 allocs: 762.940 MiB, 15.31% gc time)

julia> using BenchmarkTools

julia> @btime rand(100, 10000, 100);
  120.989 ms (2 allocations: 762.94 MiB)

julia> @be rand(100, 10000, 100) seconds=1
Benchmark: 2 samples with 1 evaluation
       558.540 ms (2 allocs: 762.940 MiB)
       1.197 s (2 allocs: 762.940 MiB, 86.95% gc time)

julia> @time rand(100, 10000, 100);
  0.169106 seconds (2 allocations: 762.940 MiB)

julia> @time rand(100, 10000, 100);
  0.521368 seconds (2 allocations: 762.940 MiB, 61.98% gc time)

julia> @time rand(100, 10000, 100);
  0.112622 seconds (2 allocations: 762.940 MiB)

julia> @time rand(100, 10000, 100);
  0.531731 seconds (2 allocations: 762.940 MiB, 61.01% gc time)

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (aarch64-linux-gnu)
  CPU: 8 × unknown
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, generic)
  Threads: 1 on 8 virtual cores

----------------------

x@fedora:~$ julia +release --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _' |  |
  | | |_| | | | (_| |  |  Version 1.10.2 (2024-03-01)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using Chairmarks

julia> for size in [100, 300, 1000, 3000, 10000]
          # (@b rand(100, size, 100)).gc_fraction == 0 && println(size)
           display(@b rand(100, size, 100))
       end
577.586 μs (2 allocs: 7.629 MiB)
1.787 ms (2 allocs: 22.888 MiB)
8.989 ms (2 allocs: 76.294 MiB, 4.03% gc time)
29.419 ms (2 allocs: 228.882 MiB, 4.76% gc time)
94.930 ms (2 allocs: 762.940 MiB, 4.78% gc time)

julia> for size in [100, 300, 1000, 3000, 10000]
          # (@b rand(100, size, 100)).gc_fraction == 0 && println(size)
           display(@b rand(100, size, 100))
       end
848.213 μs (2 allocs: 7.629 MiB)
1.849 ms (2 allocs: 22.888 MiB)
9.638 ms (2 allocs: 76.294 MiB, 4.31% gc time)
30.085 ms (2 allocs: 228.882 MiB, 5.20% gc time)
99.414 ms (2 allocs: 762.940 MiB, 5.75% gc time)

julia> using BenchmarkTools

julia> @b rand(100, 10000, 100)
95.285 ms (2 allocs: 762.940 MiB, 8.78% gc time)

julia> @b rand(100, 10000, 100)
92.532 ms (2 allocs: 762.940 MiB, 6.28% gc time)

julia> @btime rand(100, 10000, 100);
  85.244 ms (2 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100)
90.619 ms (2 allocs: 762.940 MiB)
<increase memory pressure>
julia> @b rand(100, 10000, 100)
649.771 ms (2 allocs: 762.940 MiB, 82.25% gc time)

julia> @b rand(100, 10000, 100)
244.912 ms (2 allocs: 762.940 MiB, without a warmup)

julia> @b rand(100, 10000, 100)
226.078 ms (2 allocs: 762.940 MiB, 58.83% gc time, without a warmup)

julia> @b rand(100, 10000, 100)
201.505 ms (2 allocs: 762.940 MiB, 52.92% gc time)

julia> @b rand(100, 10000, 100)
295.420 ms (2 allocs: 762.940 MiB, without a warmup)

julia> @b rand(100, 10000, 100)
224.846 ms (2 allocs: 762.940 MiB, 58.04% gc time, without a warmup)

julia> @b rand(100, 10000, 100)
612.017 ms (2 allocs: 762.940 MiB, 79.08% gc time)

julia> @b rand(100, 10000, 100) seconds=2
259.160 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) seconds=2
134.677 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) seconds=2
259.993 ms (2 allocs: 762.940 MiB)

julia> @btime rand(100, 10000, 100)
  90.647 ms (2 allocations: 762.94 MiB)
100×10000×100 Array{Float64, 3}:
[output]

julia> @btime rand(100, 10000, 100);
  88.128 ms (2 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100) seconds=10
86.466 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) seconds=10
89.588 ms (2 allocs: 762.940 MiB)

julia> @macroexpand @btime rand(100, 10000, 100);

julia> @macroexpand @btime rand(100, 10000, 100)
quote
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:651 =#
    local var"##251" = begin
                #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:482 =#
                BenchmarkTools.generate_benchmark_definition(Main, Symbol[], Any[], Symbol[], (), $(Expr(:copyast, :($(QuoteNode(:(rand(100, 10000, 100))))))), $(Expr(:copyast, :($(QuoteNode(nothing))))), $(Expr(:copyast, :($(QuoteNode(nothing))))), BenchmarkTools.Parameters())
            end
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:652 =#
    (BenchmarkTools).warmup(var"##251")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:653 =#
    (BenchmarkTools).tune!(var"##251")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:654 =#
    local (var"##252", var"##253") = (BenchmarkTools).run_result(var"##251")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:655 =#
    local var"##254" = (BenchmarkTools).minimum(var"##252")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:656 =#
    local var"##255" = (BenchmarkTools).allocs(var"##254")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:657 =#
    println("  ", (BenchmarkTools).prettytime((BenchmarkTools).time(var"##254")), " (", var"##255", " allocation", if var"##255" == 1
            ""
        else
            "s"
        end, ": ", (BenchmarkTools).prettymemory((BenchmarkTools).memory(var"##254")), ")")
    #= /home/x/.julia/packages/BenchmarkTools/QAv7m/src/execution.jl:668 =#
    var"##253"
end

julia> @b rand(100, 10000, 100) seconds=10
85.512 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) seconds=2
134.735 ms (2 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) seconds=2
145.472 ms (2 allocs: 762.940 MiB)

----------------

julia> @be rand(100, 10000, 100) seconds=10
Benchmark: 14 samples with 1 evaluation
min    217.497 ms (2 allocs: 762.940 MiB)
median 523.573 ms (2 allocs: 762.940 MiB, 24.84% gc time)
mean   680.660 ms (2 allocs: 762.940 MiB, 30.91% gc time)
max    2.090 s (2 allocs: 762.940 MiB, 72.89% gc time)

julia> @benchmark rand(100, 10000, 100)
BenchmarkTools.Trial: 47 samples with 1 evaluation.
 Range (min … max):   84.599 ms … 168.411 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      96.469 ms               ┊ GC (median):    7.84%
 Time  (mean ± σ):   100.978 ms ±  16.851 ms  ┊ GC (mean ± σ):  8.95% ± 9.43%

  ▂█      ▂▂                                                     
  ███▄▆█▁▄██▆▁▁▁▄▄▆▄▄▄▁▁▁▄▄▁▁▁▄▄▆▁▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  84.6 ms          Histogram: frequency by time          168 ms <

 Memory estimate: 762.94 MiB, allocs estimate: 2.

julia> @be rand(100, 10000, 100) seconds=10
Benchmark: 60 samples with 1 evaluation
min    87.863 ms (2 allocs: 762.940 MiB)
median 147.470 ms (2 allocs: 762.940 MiB, 7.08% gc time)
mean   149.320 ms (2 allocs: 762.940 MiB, 18.94% gc time)
max    340.094 ms (2 allocs: 762.940 MiB, 72.89% gc time)

There's a lot going on here, some of which is stuff I'd like to fix, and some of which is stuff I don't want to or don't think it's plausible to fix

  1. Chairmarks sometimes reports a higher runtime than BenchmarkTools. This is not a bug unless we can demonstrate that BencharkTools' lower reported result is more accurate than Chairmarks'
  2. Chairmarks reports results that vary based on factors other than the code being run in a way that makes it harder to reliably interpret results. This is a problem, and as much as possible, I'd like to fix it. However, for some usage patterns it may not be possible for Chairmarks to prevent a user from seeing strange and hard to interpret results reflective of a strange reality.
  3. Reduced accuracy on out of date versions of Julia is not a priority for me unless it either renders the package totally unusable (nothing here does) or is indicative of issues that likely persist in the latest versions of Julia (I suspect this may be the case here)
  4. I want Chairmarks to report how long a function takes to run. If some strange magic effects the actual runtime, I want that to be reflected by strange results.

I think that variable allocation and GC time is what is causing all this. In a system with high memory pressure, allocation and GC time depend on how "available" memory is to the process and to the running Julia code. This availability may be at the OS or the Julia GC level, I'm not sure. In any event, something in the @btime call (either repeated allocations or repeated calls to GC.gc(), I'm guessing) makes memory more quickly available to the running Julia code. This, in turn, makes calls to rand(100, 10000, 100) actually run faster, which Chairmarks reports.

My remaining questions, then, are

  1. How is BenchmarkTools making this memory more quickly available?
  2. Is that increased memory availability reflective of "real" usage?
  3. Regardless of whether or not BenchmarkTools's reported results are more accurate or reflective of reality, are they more useful in, for example, comparing two alternative implementations of rand which allocate the same amount to see which is faster?
XLin0mu commented 6 months ago

Chairmarks sometimes reports a higher runtime than BenchmarkTools. This is not a bug unless we can demonstrate that BencharkTools' lower reported result is more accurate than Chairmarks'

The printed time cost is not a serious concept, but many people require them, and usually prefer a shorter one. Fortunately, for this question, Chairmarks does good actually. It seem like the problem occured on this issue is not a general situation for Chairmarks. (see below)

Reduced accuracy on out of date versions of Julia is not a priority for me unless it either renders the package totally unusable (nothing here does) or is indicative of issues that likely persist in the latest versions of Julia (I suspect this may be the case here)

Yes, and more on this: I think improving the result stability (not enough as much as BenchmarkTools) on high-resource-required task benchmark have higher priority than figure out why version difference effect on these question. But also, comparing the difference between these version maybe helpful for finding out the problem.

The reason that I choosed BenchmarkTools as the standard for testing Chairmarks is because the results given by BenchmarkTools is typically more stable than Chairmarks on special type task. But in other state, Chairmarks can work better on low-resource-required task benchmark, and this result is not attributed to amount of samples:

julia> using Chairmarks, BenchmarkTools

julia> @be findSth()
4314 milliseconds
Benchmark: 1 sample with 1 evaluation
       4.315 s (19 allocs: 632 bytes, without a warmup)

julia> @btime findSth()
4322 milliseconds
4316 milliseconds
4315 milliseconds
4315 milliseconds
4324 milliseconds
5091 milliseconds
  4.325 s (12 allocations: 376 bytes)

findSth() is a function including rand()/randn(), and print it's time cost. Although BenchmarkTools ran manytimes findSth, with Chairmarks only once, their result're close, and Chairmarks' print time cost even shorter than BenchmarkTools'.

I also tried the test below from two side, amount of samples, and computer-resource-requirement:

julia> @btime rand(100, 10000, 100);
  124.924 ms (3 allocations: 762.94 MiB)
#we assume this is the standard temporaly

julia> @be rand(100, 10000, 100) seconds=15
Benchmark: 90 samples with 1 evaluation
min    163.741 ms (3 allocs: 762.940 MiB, 7.98% gc time)
median 166.733 ms (3 allocs: 762.940 MiB, 8.32% gc time)
mean   166.977 ms (3 allocs: 762.940 MiB, 8.31% gc time)
max    173.272 ms (3 allocs: 762.940 MiB, 8.70% gc time)

julia> @be rand(100, 10000, 100) seconds=150
Benchmark: 849 samples with 1 evaluation
min    162.024 ms (3 allocs: 762.940 MiB, 6.90% gc time)
median 166.415 ms (3 allocs: 762.940 MiB, 8.20% gc time)
mean   167.998 ms (3 allocs: 762.940 MiB, 8.24% gc time)
max    201.978 ms (3 allocs: 762.940 MiB, 18.03% gc time)
#124.924 and 162.024, obvious gap
#90 samples' result is not that different from 849 samples' result, especially on min-time-cost

julia> @btime rand(100, 1000, 100);
  12.425 ms (3 allocations: 76.29 MiB)

julia> @be rand(100, 1000, 100) seconds=15
Benchmark: 847 samples with 1 evaluation
min    13.127 ms (3 allocs: 76.294 MiB, 4.52% gc time)
median 16.685 ms (3 allocs: 76.294 MiB, 9.60% gc time)
mean   16.730 ms (3 allocs: 76.294 MiB, 9.79% gc time)
max    35.706 ms (3 allocs: 76.294 MiB, 15.44% gc time)
#12.425 and 13.127, get closer

julia> @btime rand(100, 100, 100);
  1.154 ms (3 allocations: 7.63 MiB)

julia> @be rand(100, 100, 100) seconds=1.5
Benchmark: 897 samples with 1 evaluation
min    1.160 ms (3 allocs: 7.629 MiB)
median 1.669 ms (3 allocs: 7.629 MiB, 15.98% gc time)
mean   1.631 ms (3 allocs: 7.629 MiB, 12.08% gc time)
max    3.863 ms (3 allocs: 7.629 MiB, 68.38% gc time)
#1.154 and 1.160 , almost same, and for this scale, we can think they're just same

Hope them helps.

LilithHafner commented 6 months ago

Would you please try to reproduce the OP with Chairmarks 1.2 & the gc=false and seconds=5 keywords?

This is what I get, but I've never been able to reproduce to OP reliably anyway, so IDK if it will also be fixed for you:

julia> using Chairmarks

julia> @b rand(100, 10000, 100) gc=false seconds=5
85.905 ms (3 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) gc=false seconds=5
86.125 ms (3 allocs: 762.940 MiB)

julia> using BenchmarkTools

julia> @b rand(100, 10000, 100) gc=false seconds=5
86.244 ms (3 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) gc=false seconds=5
86.590 ms (3 allocs: 762.940 MiB)

julia> @btime rand(100, 10000, 100);
  86.157 ms (3 allocations: 762.94 MiB)

julia> @b rand(100, 10000, 100) gc=false seconds=5
86.574 ms (3 allocs: 762.940 MiB)

julia> @b rand(100, 10000, 100) gc=false seconds=5
86.688 ms (3 allocs: 762.940 MiB)

julia> import Pkg; Pkg.TOML.parsefile(joinpath(pkgdir(Chairmarks), "Project.toml"))["version"]
"1.2.0"
XLin0mu commented 6 months ago

thx, it's fixed. anyway, it work on this issue