bench: use GOAMD64=v3, increase benchmark count, interleave runs

thepudds commented 8 months ago

An individual benchmark run can be "unlucky" in terms of how many overflow buckets there are for the runtime map, or how skewed the occupancy is for groups in swiss.Map, and so on.

To help with noise and hopefully make it easier to compare results across different commits, we update bench.sh to do 20 runs per benchmark rather than 10, but reduce the time per benchmark from 1 second to 0.5 seconds to keep the total time roughly the same. More samples also gives more grist to the statistical mill for benchstat to identify outliers, and of course halves the impact of each sample.

We also interleave the swiss.Map and runtime map runs. This can help reduce the impact of background processes, or "noisy neighbors" in a public cloud environment, etc. We run swiss.Map first given it is usually the more interesting data (e.g., in case someone wants to spot check initial results or ^C early, etc.).

As a follow up to #29, we also set the GOAMD64 environment variable based on the platform to help use more modern CPU instructions.

While we are here, we also fix the munging of the results to account for the new benchmark name format that was introduced in #22.

cockroach-teamcity commented 8 months ago

This change is

thepudds commented 8 months ago

Hi @petermattis, I won't be offended if you'd prefer to keep bench.sh simpler. 😅

Also, I don't have a macOS laptop immediately handy, so sorry in advance if there is a macOS quirk here due to older Bash version or similar. (I did run this on the GitHub macOS CI runners, but I think those might still be amd64 and did not check what OS version).

petermattis commented 8 months ago

I'd prefer to keep bench.sh simpler than this. The benchmarks aren't going to be run frequently as I don't expect this code to be changing often. The said, I do appreciate the guidance on using GOAMD64=v3 and will be doing a longer bench run soon. I'm also intrigued by the interleaving of swiss.Map and runtimeMap runs. Is that impactful? Another item that could potentially be impactful in what you've done here is that you're only running a few iterations for each invocation of ./swiss.test. That means every benchmark is starting from a "clean" runtime environment. I wonder if that makes any difference. I'm intrigued enough that I might do a benchmark run with this technique just to see.

cockroachdb / swiss

bench: use GOAMD64=v3, increase benchmark count, interleave runs #31