golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.7k stars 17.5k forks source link

testing: add ratcheting variants #7465

Open josharian opened 10 years ago

josharian commented 10 years ago
For some testing and benchmark purposes, a ratchet is better suited than an average.

https://golang.org/cl/67870053/ bumps up the number of AllocsPerRun runs of an
http test to avoid flakiness. This test would be more reliable using a lower number of
runs if it could measure the best run rather than the average. In addition, it could set
an explicit (rather than comparative) goal for the number of allocs, which would allow
it to catch other regressions. With care, MinAllocsPerRun could even use heuristics to
avoid requiring the user to pass an explicit number of runs.

For benchmarking tightly CPU-bound code with minimal scheduler/OS interactions, a
ratcheting benchmark will often yield more stable, useful results than an averaging
benchmark.
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-none.

minux commented 10 years ago

Comment 2:

i'd expect that using the best result of abfew runs might introduce yet another kind of
flaky, i.e. false positive one. comparing to false negative flaky results we are
getting,  i'd rather get the later.
rsc commented 7 years ago

For allocs, I agree that it would be nice to fix AllocsPerRun in some ideal world, although we're a bit stuck with it now. I'm also not sure we can build an API with no runs parameter: it seems like at the least you need a max count. If f is expensive then you might not want to run it very many times, and if f is unstable then you need to cut it off at some point. It might be nice to sketch out a func CountAllocs(f func()) int, but I'd be worried about these kinds of complications. In contrast, AllocsPerRun is very easy to specify and understand. There's no magic that can break.

For CPU, I think the number of times when you actually want just a ratchet is pretty low. Modern systems are weird enough that even the lowest possible observed time can be misleading. Maybe 99% of the time the top takes 5ns but occasionally the stars align just right and it takes 3ns. I've seen craziness like this. Then the min of all the runs is noisier than the average. I do think we should expose the underlying distribution, as in #19128, which is much better than any one number.

Given #19128, can we trim this issue down to being just about allocation counting?

josharian commented 7 years ago

For CPU, I think the number of times when you actually want just a ratchet is pretty low.

Fair enough. And my benchmarking interests are probably atypical.

Given #19128, can we trim this issue down to being just about allocation counting?

Yes.