Open vlovich opened 10 months ago
It's entirely possible that repro case may not 100% repro as written on random machines. It looks like the number of elements & maybe the speed of the machine is relevant (I'm running that benchmark on a 13900K).
I'm curious if my absolutely blind ignorant hunch is that there's integer math somewhere that ends up with a 0 due to some integer division and is then the numerator for a floating point operation which results in a NaN. I'd love to help track down the bug but I don't know where to start.
So tracked this down a bit further. Looks like the values being plotted are all the same. This results in a stddev of 0 when estimating the KDE bandwidth within criterion::kde::sweep_and_estimate
which results in a bandwidth of 0 which problematic because bandwidth is in the denominator & hence the NaN suddenly appearing. Not sure why all the samples have exactly the same value though...
Hah. There's even a comment trying to address this.
116 // prevent gnuplot bug when all values are equal
117 let elapsed = vec![t_prev, t_prev + 0.000001].into_boxed_slice();
Likely what's happening is that when values are large (in my case the test is taking 30s), the + 0.000001
does nothing.
Replacing it with
let elapsed = vec![t_prev, next_up(t_prev)].into_boxed_slice();
where next_up is ported from the std library fixes the issue I think (at least for this repro that I have).
Put up a minimal repro here: https://github.com/vlovich/criterion-bug-repro
Notice below that the 5 Gelem/s number is total garbage (+ the NaN in the output). The "Took Xs or N Kelements/s" is printed by the benchmark directly. So something is causing garbage in criterion's measurements (looks like something is injecting NaN).
When HTML reports are enabled, the benchmark crashes because of the NaNs.
Maybe the problem is somehow too many elements?
HTML report crash (backend doesn't matter - both gnuplot and plotter fail the same way).