BrianHicks / elm-benchmark

Benchmarking for Elm
http://package.elm-lang.org/packages/BrianHicks/elm-benchmark/latest
BSD 3-Clause "New" or "Revised" License
47 stars 5 forks source link

Add warnings, errors, and tips to benchmark report #13

Closed jwoudenberg closed 6 years ago

jwoudenberg commented 7 years ago

In the same vein as the elm compiler it wouldn't be really nice if elm-benchmark gave us warning, errors, and tips to help us write better benchmarks. From working with Brian a bit, I know he has tons of context on this, part of which could be automatically distributed in the benchmark report.

Below is an outline of some of Brian's tips I remeber, to give an idea of the type of helpful messages that could be displayed.

BrianHicks commented 6 years ago

FWIW I'm reducing these numbers down to two in the next version: runs per second and goodness of fit. Runs per second is pretty self-descriptive, but goodness of fit is not. In the new version, we vary sample size in order to generate a trend line, and goodness of fit is a measure of errors in the trend. It's expressed in terms of percent, and higher is better. So these advice will end up close to:

  1. total samples are low (number TBD but related to samples/bucket): same advice as "run counts are low" above.
  2. 5% of buckets have points outside 2 sigma (exact numbers TBD): high outlier count, try re-running (just reloading the page will keep the JIT hot enough to avoid these, usually.)
  3. goodness of fit is less than 95%: there may be interference on the system. Try closing programs or tabs that are consuming significant system resources (Slack, Spotify are typical candidates) and re-running.
  4. goodness of fit is less than 85%: There's something really wrong, don't trust these results.
    1. same advice on closing heavy tabs or programs
    2. if that doesn't solve it, try increasing the sample time
    3. if that doesn't solve it, show up in #elm-benchmark on the Elm Slack and we'll try to get you sorted out. There's probably some error this tool can't detect, or we need to account for your system setup in the sampling approach.
BrianHicks commented 6 years ago

Also, the new approach solves these in the following ways:

In addition I'm adding lots of charts. Just looking at the data shows problems more often than you'd suspect, humans are very good at "hey, that's weird..." and not trusting the results. So for example, I can show the points. That shows outliers easily, as well as jags due to system spikes. If I show the trend line, it'll be obviously a good or bad fit (it's kinda susceptible to outliers.)

BrianHicks commented 6 years ago

moved to elm-explorations/benchmark#4