Elaborate the statistical model

harendra-kumar commented 3 years ago

Refer to this in the docs:

Here is a procedure used by tasty-bench to measure execution time:

    Set 𝑛←1

.
Measure execution time 𝑡𝑛
of 𝑛 iterations and execution time 𝑡2𝑛 of 2𝑛
iterations.
Find 𝑡
which minimizes deviation of (𝑛𝑡,2𝑛𝑡) from (𝑡𝑛,𝑡2𝑛)
.
If deviation is small enough (see --stdev below), return 𝑡
as a mean execution time.
Otherwise set 𝑛←2𝑛
and jump back to Step 2.

I stumbled on Find 𝑡 which minimizes deviation of (𝑛𝑡,2𝑛𝑡) from (𝑡𝑛,𝑡2𝑛)? Do you calculate the mean per iteration i.e. (tn+t2n)/3n, and then calculate the deviation of the two points from that mean? If that stddev is less than 5% then you stop otherwise continue?

What happens if the deviation never comes below the threshold? Is --timeout the only bailout option? What is the default behavior? Does it continue forever?

Bodigrim commented 3 years ago

It's actually t = (tn+2*t2n)/5n to minimize deviation. Thanks, I'll clarify this.

If deviation never comes below threshold and --timeout is not set, after 100 second a diagnostic message is thrown to stderr, urging user to relax --stdev or set --timeout. Otherwise it will continue forever.

Unless a user directly asked for subpar quality of results (via --stdev or --timeout), IMO it's wrong for a benchmark suite to terminate evaluation earlier on its own accord. I really hate running an hour-long criterion benchmark, only to discover later from its log that half of results were dominated by outliers and are nothing but garbage.

Bodigrim commented 3 years ago

Remember that both --stdev and --timeout can be specified locally, per benchmark group. For example, one can wrap everything under defaultMain into localOption (mkTimeout 1e9).

harendra-kumar commented 3 years ago

It would be nice to have some of these details in the docs.

Bodigrim commented 3 years ago

I've updated docs in ab49d86e06c26aa417fe58067d43f748f8bdc0c2, thanks.

Bodigrim / tasty-bench

Elaborate the statistical model #15