Open Akirathan opened 1 year ago
As noted, the std-dev allows us to easier differentiate benchmarks like the navy vs orange ones from this analysis (one has very high variance and the other is very 'stable').
Just a note that in #8091 I added displaying stdev within the Enso runner for the benchmarks. This is a bit perpendicular, because this is not the runner used on CI so the ticket remains open - but it is slightly related so I thought it may be worth noting.
We can see how the stdev allows us to better judge the benchmarks - for example the high stdev in the Enso variants may show that the warmup time is insufficient and should be made larger if we want to see the peak performance.
Our benchmark charts currently displays only score which is an average of one benchmark iteration in milliseconds.
Since JMH can, and does, output stddev, it would be nice to:
stddev is a very important metric as it tells us basically how much we can trust the results for a particular benchmark.