evanphx / benchmark-ips

Provides iteration per second benchmarking for Ruby
MIT License
1.72k stars 97 forks source link

Implement automatic warmup timing #135

Open evanphx opened 10 months ago

evanphx commented 10 months ago

The idea is to run the item until the cycles per 100ms timing is within 1% of the previous run. This means that the default is still 2 seconds like it was originally, but now if those 2 seconds didn't yield runs that were close enough together, the warmup will continue to run.

It will run for a maximum of 30 seconds.

There are undoubtedly more sophisticated ways to terminate the warmup, but this is a simple metric that is easy for end users to understand.

Fixes #92

eregon commented 10 months ago

Auto-detecting warmup is always tempting and certainly convenient when it works and for instance I wrote this (PR) for yjit-bench. It starts with a 0.1% goal and uses the median absolute deviation over all iterations so far. This means it has a better idea of warmup because it considers all iterations and not just the last two (the current approach in this PR). It also increases the goal by 0.01% per iteration so that it allows more margin for more noisy benchmarks (and gives up after 20 minutes). It's not perfect though and e.g. can fail to find steady state if the benchmark is quite noisy or if there is a very long flat region (i.e. approximately as long as half of all iterations so far) which is not actually peak.

So I think the safest would be to not add auto warmup to benchmark-ips, because it will fail in some cases and then it would be misleading. If we still want to add it then I think we should try to detect when it fails and warn in such a case and check it on many benchmarks and Ruby implementations.

evanphx commented 10 months ago

@eregon Good context. Perhaps another way to look at this is, by having a simple autowarm by default that isn't sophisticated, it's better than the default (strictly 2 seconds) without making people overly confident that it's always correct.

Folks testing JITs, etc are always going to want to specify the warmup, I don't we could come up with a general purpose algorithm that would work for those folks and folks using benchmark-ips to just test out 2 different chunks of code.

This would be aimed at that later group, where the warmup can catch potential areas where their default warmup wasn't sufficiant.

eregon commented 10 months ago

I think these two groups overlap quite a bit, e.g., folks just using benchmark-ips and running on YJIT. Or of course folks just using benchmark-ips and running on JRuby/TruffleRuby (although I guess less frequent).

For CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine. For YJIT it's less clear, we would need to look at warmup curves on bigger benchmarks and see if autowarmup detects it correctly there. On small benchmarks, the 2 seconds should be enough for YJIT and probably for TruffleRuby Native too, while it might not be enough for TruffleRuby JVM and JRuby (where it's likely most difficult to autowarmup correctly).

I think we'd need to look at some bigger benchmarks and see if the autowarmup is better than the default 2 seconds.

nateberkopec commented 9 months ago
  1. Should there me a message printed if you hit the full 30 seconds? It's kind of a "failure" in some sense.
  2. 1% feels aggressive to me. The only time I see 1% variance in a benchmark-ips result is when I'm testing the most braindead low-level Ruby stuff... certainly any Rails bench involving database interaction at all usually ends up with a few % in either direction. Maybe changing the goal over time (as eregon suggested) is the solution.

CRuby interpreter no-JIT, then autowarmup is not needed, and the default 2 seconds is fine.

Plenty of people use benchmark-ips to test application-level code which may be ~0.5-1second per iteration. I like autowarmup because it will automatically give these users longer, more useful warmups.