Open bluss opened 8 years ago
Ideally cargo bench
would output info of all the runs that it does, then we can do this as well as use more rigorous statistics (#4). Sadly I don't currently have the bandwidth to contribute that option to cargo bench
, nor do I quite understand the stabilisation story around the feature (which makes me hesitant to spend time on it at all).
But that's a general comment. More particular to your request: Can you elaborate a little bit when you'd want to paper over variability?
In a way it's just to run the benchmarks more times, to have more attempts at getting a stable timing.
I don't know if it helps you, but stable releases of Rust can run "cargo bench" if you configure the crate to use no default harness for that and have some replacement benchmark framework. For this reason, crate matrixmultiply
outputs cargo-benchcmp compatible output using "cargo bench" with a stable Rust release.
That is very interesting. I didn't know that was possible. I want to look into that then, but not sure when I'll have time. I'll try to at least look into this (and report on it) this year. Unless someone else beats me to it of course ;)
Here's how I solve this problem so far. For picking the best of multiple runs, there's a simple script to merge two or more benchmark files, picking the best.
https://gist.github.com/bluss/d8d65ecb093fa324de77eb145e83cee8
One relatively simple way to paper over variability of benchmarks (for example cpu warmup-related things) is to pick the best time of multiple runs. The tool could allow having multiple input files for both before and after.