bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.31k stars 292 forks source link

criterion panicking with `slice.len() > 1 && slice.iter().all(|x| !x.is_nan())` #720

Open vlovich opened 10 months ago

vlovich commented 10 months ago

Put up a minimal repro here: https://github.com/vlovich/criterion-bug-repro

Notice below that the 5 Gelem/s number is total garbage (+ the NaN in the output). The "Took Xs or N Kelements/s" is printed by the benchmark directly. So something is causing garbage in criterion's measurements (looks like something is injecting NaN).

When HTML reports are enabled, the benchmark crashes because of the NaNs.

Maybe the problem is somehow too many elements?

$ RUST_BACKTRACE=1 cargo bench --bench ordsearch -- --quick 
   Compiling ordsearch-benchmark v0.1.0 (/home/vlovich/projects/criterion-bug-repro)
    Finished bench [optimized] target(s) in 0.59s
     Running benches/ordsearch.rs (target/release/deps/ordsearch-145433b4158ff7a7)
Benchmarking Lookup Filter Fixed Keys/ordsearch/5000000/5
Took 6.064670186s or 824.4471416668724 Kelements / s
Lookup Filter Fixed Keys/ordsearch/5000000/5
                        time:   [1.0000 ms 1.0000 ms 1.0000 ms]
                        thrpt:  [5.0000 Gelem/s 5.0000 Gelem/s 5.0000 Gelem/s]
                 change:
                        time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
                        thrpt:  [+0.0000% +0.0000% +0.0000%]
                        No change in performance detected.
Benchmarking Lookup Filter Fixed Keys/ordsearch/5000000/11
Took 6.703530679s or 745.875604875411 Kelements / s
Lookup Filter Fixed Keys/ordsearch/5000000/11
                        time:   [1.0000 ms 1.0000 ms 1.0000 ms]
                        thrpt:  [5.0000 Gelem/s 5.0000 Gelem/s 5.0000 Gelem/s]
                 change:
                        time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
                        thrpt:  [+0.0000% +0.0000% +0.0000%]
                        No change in performance detected.
Benchmarking Lookup Filter Fixed Keys/ordsearch/4823449/20
Took 6.425890771s or 750.6272938482228 Kelements / s
Lookup Filter Fixed Keys/ordsearch/4823449/20
                        time:   [1.0000 ms 1.0000 ms 1.0000 ms]
                        thrpt:  [4.8234 Gelem/s 4.8234 Gelem/s 4.8234 Gelem/s]
                 change:
                        time:   [+0.0000% +0.0000% +0.0000%] (p = NaN > 0.05)
                        thrpt:  [+0.0000% +0.0000% +0.0000%]
                        No change in performance detected.
Benchmarking Lookup Filter Fixed Keys/ordsearch/3215633/30
Took 3.926567864s or 818.9424228425866 Kelements / s

Took 3.852148075s or 834.7636013446861 Kelements / s

Took 3.933135121s or 817.5750135892674 Kelements / s
Lookup Filter Fixed Keys/ordsearch/3215633/30
                        time:   [3.8927 s 3.8994 s 3.9266 s]
                        thrpt:  [818.94 Kelem/s 824.64 Kelem/s 826.08 Kelem/s]

HTML report crash (backend doesn't matter - both gnuplot and plotter fail the same way).

$ RUST_BACKTRACE=1 cargo bench --bench ordsearch -F html_reports -- --quick 
    Finished bench [optimized] target(s) in 0.01s
     Running benches/ordsearch.rs (target/release/deps/ordsearch-99eff1a7768d6938)
Benchmarking Lookup Filter Fixed Keys/ordsearch/5000000/5
Took 5.69630772s or 877.7615686815459 Kelements / s
Lookup Filter Fixed Keys/ordsearch/5000000/5
                        time:   [1.0000 ms 1.0000 ms 1.0000 ms]
                        thrpt:  [5.0000 Gelem/s 5.0000 Gelem/s 5.0000 Gelem/s]
thread 'main' panicked at 'assertion failed: slice.len() > 1 && slice.iter().all(|x| !x.is_nan())', /home/vlovich/.cargo/registry/src/index.crates.io-6f17d22bba15001f/criterion-0.4.0/src/stats/univariate/sample.rs:31:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: core::panicking::panic
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
   3: criterion::plot::gnuplot_backend::pdf::pdf_small
   4: <criterion::plot::gnuplot_backend::Gnuplot as criterion::plot::Plotter>::pdf
   5: <criterion::html::Html as criterion::report::Report>::measurement_complete
   6: criterion::analysis::common
   7: criterion::benchmark_group::BenchmarkGroup<M>::bench_with_input
   8: ordsearch::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: bench failed, to rerun pass `--bench ordsearch`
vlovich commented 10 months ago

It's entirely possible that repro case may not 100% repro as written on random machines. It looks like the number of elements & maybe the speed of the machine is relevant (I'm running that benchmark on a 13900K).

I'm curious if my absolutely blind ignorant hunch is that there's integer math somewhere that ends up with a 0 due to some integer division and is then the numerator for a floating point operation which results in a NaN. I'd love to help track down the bug but I don't know where to start.

vlovich commented 8 months ago

So tracked this down a bit further. Looks like the values being plotted are all the same. This results in a stddev of 0 when estimating the KDE bandwidth within criterion::kde::sweep_and_estimate which results in a bandwidth of 0 which problematic because bandwidth is in the denominator & hence the NaN suddenly appearing. Not sure why all the samples have exactly the same value though...

vlovich commented 8 months ago

Hah. There's even a comment trying to address this.

116                 // prevent gnuplot bug when all values are equal
117                 let elapsed = vec![t_prev, t_prev + 0.000001].into_boxed_slice();

Likely what's happening is that when values are large (in my case the test is taking 30s), the + 0.000001 does nothing.

Replacing it with

let elapsed = vec![t_prev, next_up(t_prev)].into_boxed_slice();

where next_up is ported from the std library fixes the issue I think (at least for this repro that I have).