Open filiph opened 4 years ago
It's easy enough to copy-paste the needed code here.
I went down a rabbit's hole of research on how to best present variance in benchmarks (there's a lot of prior art). I have a lot of notes. The gist is that even with MoE / standard deviation, comparing averages is too crude and leads to confusion. I'll investigate further.
My question above still stands: is this is in scope of this package?
The other useful numbers would be standard deviation, median, min and max
For the benchmark measurements to be useful when comparing two or more versions of some code, we need to know the margin of error (MoE). Otherwise, we can't know whether an optimization is actually, significantly better than the base.
Here's what I mean:
Without MoE, this looks good. We made the code almost 2% faster with the second commit, right? No:
We actually have no idea if the new code is faster. But we wouldn't know this without the MoE column, and we might prematurely pick the wrong choice.
Right now
benchmark_harness
only gives a single number. I often resort to running the benchmark many times, in order to ascertain the variance of measurements. This is slow and wasteful, because it's basically computing a mean of means. A measurement that could last ~2 seconds takes X * ~2 seconds, where X is always >10 and sometimes ~100.I'm not sure this is in scope of this package, seeing as this one seems to be focused on really tight loops (e.g.
forEach
vsaddAll
) and long-term tracking of the SDK itself. Maybe it should be a completely separate package?I'm proposing something like:
List.generate(n * batchIterations, () => -1)
)n
batches, each withbatchIterations
of the actual measured code, and put the measured time into the list.t_stats
package but there are many others on pub, including @kevmoo'sstats
, plus this is simple enough to be simply implemented without any external dependency.)PROs:
CONs:
for (int i = 0; i < batchIterations; i++) { run();}
many times).I know this package is in flux now. Even a simple "no, not here" response is valuable for me.