BurntSushi / cargo-benchcmp

A small utility to compare Rust micro-benchmarks.
The Unlicense
343 stars 21 forks source link

Use in CI #43

Closed danieleades closed 5 years ago

danieleades commented 5 years ago

it'd be amazing to be able to use this in CI to reject pull requests that introduce performance regressions.

There are a couple of things missing that would be needed to support this

  1. exit with error code if there are regressions. This would probably be behind a '--check' flag to be consistent with rust fmt.
  2. filter by statistical significance (#39). Filtering by percentage is essentially meaningless. This would be essential to reduce the noise in CI. This could be configurable by either P-value or sigma, with a sensible default. Doesn't have to be anything too fancy- the difference between two averages with known variance is itself a normally distributed random variable. you just need the P-value that the distribution includes 0, or the number of standard deviations that the mean is from 0. plenty of literature out there for doing this.
BurntSushi commented 5 years ago

This is unlikely to ever happen. I wouldn't be opposed to a simple PR for (1), but it's not clear that the necessary information for (2) is available.

In any case, there are reasons not to do this:

  1. Firstly, this is built around the default benchmark harness. I personally no longer use that, since it requires nightly Rust, and instead have been slowly switching over to criterion. So I don't really do much with this particular project any more, other than keep it running. I use critcmp now instead, for doing comparisons on criterion benchmarks.
  2. Running benchmarks in CI has a lot of downsides. Firstly, they can increase CI times dramatically. Secondly, the CI environment is unlikely to be very consistent, such that a benchmark could erroneously dip below the desired threshold, increasing the chances of a false negative.

In any case, my point here isn't to get into a full debate about this, but rather, to state why, as the maintainer of this project, I'm not particularly interested in this addition.