bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.61k stars 310 forks source link

Customize the number of iterations #625

Open lmtss opened 2 years ago

lmtss commented 2 years ago

I want to use Criterion to test physics engine. Each iteration corresponds to a step. As I understand it, Criterion automatically configures the number of iterations, which means that it is possible for the two algorithms to get different numbers of iterations, that is, different test data. Suppose the first algorithm has 1000 iterations and the second algorithm has 2000 iterations, then the data of the first algorithm is the first frame to the 1000th frame, and the data of the second algorithm is the first frame to the 2000th frame. So I want to be able to customize the number of iterations.

My idea can probably be expressed in code: BenchmarkConfig

pub struct BenchmarkConfig {
    pub confidence_level: f64,
    pub measurement_time: Duration,
    pub noise_threshold: f64,
    pub nresamples: usize,
    pub sample_size: usize,
    pub significance_level: f64,
    pub warm_up_time: Duration,
    pub sampling_mode: SamplingMode,
    pub quick_mode: bool,
    // my modification
    pub linear_coefficient_of_iterations: u64,
}

ActualSamplingMode

pub(crate) enum ActualSamplingMode {
    Linear,
    Flat,
    // my modification
    CustomLinear,
}
impl ActualSamplingMode {
    pub(crate) fn iteration_counts(
        &self,
        warmup_mean_execution_time: f64,
        sample_count: u64,
        target_time: &Duration,
        config: &BenchmarkConfig,// my modification
    ) -> Vec<u64> {
        match self {
            // my modification
            ActualSamplingMode::CustomLinear=> {
                (1..(n + 1) as u64).map(|a| a * config.linear_coefficient_of_iterations).collect::<Vec<u64>>()
            }
        }
    }
}
mamcx commented 1 year ago

I have the same problem, I benchmarking database and need to keep the number of batches small (like just 10):

//! Benchmarks for evaluating how we fare against sqlite

use criterion::measurement::WallTime;
use criterion::{
    criterion_group, criterion_main, BatchSize, BenchmarkGroup, BenchmarkId, Criterion, SamplingMode, Throughput,
};
use spacetimedb_bench::prelude::*;
use std::time::Duration;

pub const DB_POOL: u8 = 10;

fn build_group<'a>(c: &'a mut Criterion, named: &str, run: Runs) -> BenchmarkGroup<'a, WallTime> {
    let mut group = c.benchmark_group(named);
    group.throughput(Throughput::Elements(run as u64));

  // This is 10 but not way to say only run 10 times
    group.sample_size(DB_POOL as usize); 
    group.sampling_mode(SamplingMode::Flat);
    group
}

fn bench_insert_tx_per_row(c: &mut Criterion) {
    let run = Runs::Tiny;
    let mut group = build_group(c, "insert_row", run);

    group.bench_function(BenchmarkId::new(SQLITE, 1), |b| {
        let mut db_instance = 0;
        b.iter_batched(
            || {
                let path = sqlite::create_db(db_instance).unwrap();
                db_instance += 1;
                path
            },
            |data| {
                let mut conn = sqlite::open_conn2(&data).unwrap();
                sqlite::insert_tx_per_row(&mut conn, run).unwrap();
            },
            BatchSize::NumBatches(DB_POOL as u64),
        );
    });

    group.finish();
}

criterion_group!(benches, bench_insert_tx_per_row);
criterion_main!(benches);
bheisler commented 1 year ago

Hello! I'm not really the maintainer anymore, but...

Each iteration corresponds to a step.

I would not recommend doing that. Your benchmark should set up a simulation and run a fixed number of steps, then discard the simulation and run a whole new simulation for the next iteration. Criterion assumes that each iteration is independent from one another, so advancing the same simulation one step per iteration could introduce correlations or trends that shouldn't be there and that would throw off the statistics.

Additionally, Criterion's analysis process requires that Criterion choose the iteration count in order to calculate some of the statistics.

mamcx commented 1 year ago

I understand, the problem is that in combination with #631 is very hard to bench things like this. And I haven't found a way to generate a relatively small number (the most is around 500):

fn build_group<'a>(c: &'a mut Criterion, named: &str, run: Runs) -> BenchmarkGroup<'a, WallTime> {
    let mut group = c.benchmark_group(named);
    // We need to restrict the amount of iterations and set the benchmark for "large" operations.
    group.throughput(Throughput::Elements(run as u64));
    group.sample_size(DB_POOL as usize);
    group.sampling_mode(SamplingMode::Flat);
    group.measurement_time(Duration::from_millis(1000)); <-- This reduce, but moving up/down not make a clear change?

    group
}

fn bench_insert_tx_per_row(c: &mut Criterion) {
    let run = Runs::Tiny;
    let mut group = build_group(c, "insert_row", run);

    group.bench_function(BenchmarkId::new(SQLITE, 1), |b| {
        let mut db_instance = 0;
        b.iter_batched(
            || {
                let path = sqlite::create_db(db_instance).unwrap();
                db_instance += 1;
                path
            },
            |data| {
                let mut conn = sqlite::open_conn(&data).unwrap();
                sqlite::insert_tx_per_row(&mut conn, run).unwrap();
            },
            BatchSize::NumBatches(DB_POOL as u64),
        );
    });
    group.finish();
}

Another ideally setup for my case is say "Generate 1 Db, run ten iteration but not take in account the setup/teardown" ie:

create database

bench(inserts)

drop database