Docs unclear on how samples are collected.

Hi,

I've been experimenting with criterion and whilst it looks good (really good),after reading the docs I'm still at a loss WRT how exactly samples are being collected.

Suppose we have a benchmark that runs a benchmark and on the machine in question that takes about a second of wallclock time. We'd want to run the benchmark many times to get more samples (usually > 30). We can get 50 samples in about 50 seconds then. I've written simple benchmark loops that look like (black boxes elided):

for sample_idx in 0..50 {
  start = time();
  benchmark()
  stop = time()
  samples.push(stop - start)
}

But then criterion has the notion of measurement time from which it seems to calibrate something it calls "iterations", e.g.:

Benchmarking trace-decode-disasm/YkPT/10: Collecting 50 samples in estimated 60.571 s (500 iterations)

And I've seen the "500 iterations" bit change.

It's not clear to me what an iteration is here and where it comes from. I assume it is trying to fill the time budget (measurement_time) with as many runs of the benchmark as possible. But as we can see above, it is doing this without varying the number of samples collected.

Therefore, is what is happening conceptually:

for sample_idx in 0..50 {
  start = time();
  for iteration_idx in 0..500 {
    benchmark()
  }
  stop = time()
  samples.push(stop - start)
}

Where the entire outer loop should take at least measurement_time?

I think the docs need to be a little more explicit about what an iteration is, where it comes from, and how it is used. Specifically in the parts of the docs aimed at getting started. If someone could confirm (or not) the above, I'd be happy to have a go at a pull request.

Thanks

Bonus question: does criterion have any way to re-run benchmarks in a fresh process to capture variance at the process level?

bheisler / criterion.rs

Docs unclear on how samples are collected. #647