Consider using the standard benchmark harness instead of criterion

facebook / winterfell

A STARK prover and verifier for arbitrary computations

MIT License

795 stars 180 forks source link

We currently use criterion for benchmarking. The main limitation of criterion is that only a crate's public API can be tested.

In contrast, the standard benchmark harness (provided by the unstable test crate) allows you to mark any function with #[bench] (as you would a #[test]) - allowing you to benchmark internal functions. I believe this would be so valuable that it would outweigh the cons of using test. While working on #247, we were interested in knowing the specific timing of internal functions (e.g. here). I ended up manually inserting timing statements in the code, running the right test, and inspecting the output. IMO, this shows how limiting using criterion really is.

Note that criterion has a feature to allow benchmarking internal functions similar to #[bench], which depends on the custom_test_framework nightly feature . However, the tracking issue for custom_test_framework has been closed due to inactivity. So I personally would stay away from it if/until that changes.

Pros of using `test` over `criterion`

Ability to benchmark internal functions

Cons of using `test` over `criterion`

Less sophisticated reporting
- e.g. criterion gives you the performance increase over the last run of the benchmark
No generated html reports
Probably less reliable
- e.g. criterion pre-populates the cache before running a benchmark (not sure whether or not test does that too, but at least it's not advertised in the benchmark output)
test is still unstable

Summary

Although there are more cons than pros, I believe the ability to benchmark internal functions far outweighs any of the cons (as explained earlier). We can deal with the dependency on nightly by adding a "benchmarking" feature, which controls the use of nightly.

#![cfg_attr(feature = "benchmarking", feature(test))]

#[cfg(feature = "benchmarking")]
extern crate test;

fn internal_function() { ... }

#[cfg(test)]
#[cfg(feature = "benchmarking")]
mod bench {
    use super::*;
    use test::Bencher;

    #[bench]
    fn bench_internal_function(b: &mut Bencher) {
        b.iter(|| internal_function()));
    }
}

$ cargo +nightly bench --features=benchmarking bench_internal_function

fn internal_function() { ... } #[cfg(test)] pub mod tests { use super::*; use criterion::{black_box, Criterion}; use std::time::Duration; fn benchmarks(c: &mut Criterion) { c.bench_function("bench_internal_function", |b| { b.iter(|| internal_function()) }); } #[test] fn bench() { // `cargo test --profile bench -j1 -- --nocapture bench -- <benchmark_filter> // This workaround allows benchmarking private interfaces with `criterion` in stable rust. // Collect cli arguments let args: Vec<String> = std::env::args().collect(); // Interpret sequence of args `[ "...bench", "--", "[filter]" ]` as a trigger and extract `filter` let filter = args .windows(3) .filter(|p| p.len() >= 2 && p[0].ends_with("bench") && p[1] == "--") .map(|s| s.get(2).unwrap_or(&"".to_string()).clone()) .next(); // Return successfully if trigger wasn't found let filter = match filter { None => return, Some(f) => f, }; let profile_time = args .windows(2) .filter(|p| p.len() == 2 && p[0] == "--profile-time") .map(|s| s[1].as_str()) .next(); // TODO: Adopt `Criterion::configure_from` when it lands upstream let mut c = Criterion::default() .with_output_color(true) .without_plots() .with_filter(&filter) .warm_up_time(Duration::from_secs_f32(0.5)) .measurement_time(Duration::from_secs_f32(0.5)) .profile_time(profile_time.map(|s| Duration::from_secs_f32(s.parse().unwrap()))); benchmarks(&mut c); Criterion::default().final_summary(); // TODO: Move this function to a macro } }

facebook / winterfell