bheisler / criterion.rs

Statistics-driven benchmarking library for Rust
Apache License 2.0
4.43k stars 298 forks source link

Docs unclear on how samples are collected. #647

Open vext01 opened 1 year ago

vext01 commented 1 year ago

Hi,

I've been experimenting with criterion and whilst it looks good (really good),after reading the docs I'm still at a loss WRT how exactly samples are being collected.

Suppose we have a benchmark that runs a benchmark and on the machine in question that takes about a second of wallclock time. We'd want to run the benchmark many times to get more samples (usually > 30). We can get 50 samples in about 50 seconds then. I've written simple benchmark loops that look like (black boxes elided):

for sample_idx in 0..50 {
  start = time();
  benchmark()
  stop = time()
  samples.push(stop - start)
}

But then criterion has the notion of measurement time from which it seems to calibrate something it calls "iterations", e.g.:

Benchmarking trace-decode-disasm/YkPT/10: Collecting 50 samples in estimated 60.571 s (500 iterations)

And I've seen the "500 iterations" bit change.

It's not clear to me what an iteration is here and where it comes from. I assume it is trying to fill the time budget (measurement_time) with as many runs of the benchmark as possible. But as we can see above, it is doing this without varying the number of samples collected.

Therefore, is what is happening conceptually:

for sample_idx in 0..50 {
  start = time();
  for iteration_idx in 0..500 {
    benchmark()
  }
  stop = time()
  samples.push(stop - start)
}

Where the entire outer loop should take at least measurement_time?

I think the docs need to be a little more explicit about what an iteration is, where it comes from, and how it is used. Specifically in the parts of the docs aimed at getting started. If someone could confirm (or not) the above, I'd be happy to have a go at a pull request.

Thanks

Bonus question: does criterion have any way to re-run benchmarks in a fresh process to capture variance at the process level?

vext01 commented 1 year ago

After a discussion on zulip, we think that the above loop is conceptually what criterion is doing, and the time taken for each sample is then divided by the number of iterations.

This was the missing detail. If that is indeed the case, it'd be nice to update the docs to be clear on this.