Change benchmark template to be more impartial.

nirvdrum commented 1 year ago

By defining methods fast and slow, we lock the benchmark into whatever the results were on a particular Ruby at a particular time. Since the benchmarks are run on multiple versions of Ruby and on multiple Ruby interpreters, which variant is faster is subject to change.

I don't love the names "a" and "b", so I'm very much open to alternative naming conventions, so long as the method name either describes the variant being benchmarks or has an impartial name that just indicates it's another variant.

Joshfindit commented 1 year ago

I think that from a more holistic view, fast and slow are ideal for a couple of reasons:

The person submitting the test has typically done so because they’ve found a clear difference in speed between two methods
It’s clear for people who look at the test results later
It allows for results to be “checked in” via commits, and can therefore show changes over time

I say holistic, because there are some pieces missing:

Methods can be faster or slower depending on a lot of factors that have nothing to do with the code being called (including but not limited to processor optimizations, interpreter optimizations, or even changes to the underlying code as part of Ruby upgrades)
The terms fast and slow have to be coerced if there are more than two options (does someone add fastest? slower? Do they buck the template altogether and simply name the tests?)

Personally, I think that the bigger solution is to increase the automated testing until it can provide data from all different environments and not only show outliers but also flag times where fast(est) is actually slower. At that point there could be automatic reports for end users who don’t want to have to review all data while asking questions like “What’s best practice for iterating through an array of objects [when I’m using Ruby 3.1.x]?”.

@nirvdrum I’d love to hear your thoughts or rebuttals in response to my thoughts. I’m much more interested in the betterment of this project than in any specific suggestion I have.

nirvdrum commented 1 year ago

I forgot to mention this is a follow-up to an issue I opened long ago #110.

I've looked at many benchmarks over the years and this is the only project that attempts to name the variants in terms of winners. I don't think this is an area we need to try to innovate in.

CI currently tests against 13 different Ruby versions and I expect we'll see more configurations to test YJIT or MJIT. They do not all perform the same way. It's confusing when you run into a benchmark where something named fast is slower than something named slow or where there's no significant performance difference. That can lead someone to conclude they ran the benchmarks incorrectly, not that the benchmark's naming convention has grown stale. To the best of my knowledge, no one is going back and updating these benchmarks to rename based on current performance; it's a maintenance burden to do so.

I'm also concerned this biases the benchmarks before they're even written because the name needs to be chosen before the benchmark is run. I'm sure someone studious enough is going to go back and rename things once results are available, but having to rename at all introduces a new potential source for errors. It just feels inverted to me. The goal of a benchmark is to evaluate performance of an approach and possibly compare it to a baseline to establish a performance differential. You're testing hypotheses, not trying to document something you already know.

Many of the benchmarks in this project are comparing multiple approaches to performing the same logical task. Eliminating that difference is a performance target for each of the Ruby implementations. In very few cases, the benchmarks here are not highlighting a fundamental issue that will hold for all time. This repo is a good source of identifying improvements for Ruby implementations and we should expect the performance profile to change over time.

Renaming all existing benchmarks is a large undertaking. That's why I never really made progress on #110, but it'd be helpful to break the pattern for newly added benchmarks, IMHO.

nirvdrum commented 1 year ago

I forgot to address one of your open questions: there are indeed benchmarks that then add "faster" and "fastest" names. It further confuses the relationships when the performance profile changes.

Grepping through the code quickly, we have:

4 benchmarks with def faster
7 benchmarks with def fastest
3 benchmarks with def slower
3 benchmarks with def slowest

I suppose that hits on another issue with the README. It gives a pattern for only two variants. When there are three or more, it leads the author to assume comparative forms of "fast" and "slow" should be used. That ends up meaning "slow" isn't always the slowest option and "fast" isn't always the fastest.

While I don't love the "a", "b", "c"... convention, at least it's reasonably clear what you should do when adding a new variant. Moreover, if modifying an existing benchmark, this naming convention doesn't require renaming all of the others to adapt to the new performance relationship.

eregon commented 1 year ago

I agree. I think the name should simply reflect method does, whether it's faster or slower is not a property of the method but more of the Ruby optimizations, Ruby implementation, platform, etc, so it's just misleading to name a method fast if it turns out slower or same speed in many cases (like it is on TruffleRuby, example).

fastruby / fast-ruby

Change benchmark template to be more impartial. #215