NVIDIA / nvbench

CUDA Kernel Benchmarking Library
Apache License 2.0
474 stars 63 forks source link

Entropy-based stopping criterion #151

Closed gevtushenko closed 7 months ago

gevtushenko commented 8 months ago

Closes https://github.com/NVIDIA/nvbench/issues/150 and https://github.com/NVIDIA/nvbench/issues/147.

This PR adds new command line option --stopping-criterion <criterion> with two predefined criteria stdrel and entropy along with API for customization of the stopping criterion. The nvbench/examples/custom_criterion.cu illustrates how custom criteria can be added on per-run basis. This opens possibilities for performance CI improvements. One can now develop criteria that, for instance, collects large sample, store the sample size and then on each re-run of performance CI loads this number, leading to better stability.

Apart from new API, entropy criterion is introduced. To enable it, it's sufficient to write --stopping-criterion entropy. The criterion computes cumulative entropy of the sample and stores it in an entropy window. Then, linear regression on the cumulative entropy window is computed. If the angle of the linear regression is small enough and coefficient of determination (R^2) is large enough, criterion believes that new samples will not introduce any new information and the sample is representative. Entropy criterion addresses concerns from https://github.com/NVIDIA/nvbench/issues/150 and https://github.com/NVIDIA/nvbench/issues/147 as well as significantly reduces variation of sample size, which is important for performance CI. Below is a plot of sample size distribution for stdrel and entropy criteria collected on nvbench/examples/throughput.cu that illustrates this point:

stdrel_vs_entropy

Below is an example where stdrel noticed small variance and decided to stop, but entropy noticed that entropy grows and kept sampling, discovering new modes:

large

large_4

Other times, entropy notices that new measurements do not introduce anything new to the sample and stops earlier:

small_2 small_3 small_5

Each criterion has its own set of parameters. Parameters like --max-noise and --min-time only affect stdrel criterion, whereas --max-angle and --min-r2 are parameters of entropy.

For now, stdrel stays as default criterion. Decision on switching the default criterion will be made after some field experience.