proposal: testing: add Keep, to force evaluation in benchmarks

aclements commented 1 year ago

Benchmarks frequently need to prevent certain compiler optimizations that may optimize away parts of the code the programmer intends to benchmark. Usually, this comes up in two situations where the benchmark use of an API is slightly artificial compared to a “real” use of the API. The following example comes from @davecheney's 2013 blog post, How to write benchmarks in Go, and demonstrates both issues:

func BenchmarkFib10(b *testing.B) {
    // run the Fib function b.N times
    for n := 0; n < b.N; n++ {
        Fib(10)
    }
}

Most commonly, the result of the function under test is not used because we only care about its timing. In the example, since Fib is a pure function, the compiler could optimize away the call completely. Indeed, in “real” code, the compiler would often be expected to do exactly this. But in benchmark code, we’re interested only in the side-effect of the function’s timing, which this optimization would destroy.
An argument to the function under test may be unintentionally constant-folded into the function. In the example, even if we addressed the first issue, the compiler may compute Fib(10) entirely at compile time, again destroying the benchmark. This is more subtle because sometimes the intent is to benchmark a function with a particular constant-valued argument, and sometimes the constant argument is simply a placeholder.

There are ways around both of these, but they are difficult to use and tend to introduce overhead into the benchmark loop. For example, a common workaround is to add the result of the call to an accumulator. However, there’s not always a convenient accumulator type, this introduces some overhead into the loop, and the benchmark must then somehow ensure the accumulator itself doesn’t get optimized away.

In both cases, these optimizations can be partial, where part of the function under test is optimized away and part isn’t, as demonstrated in @eliben’s example. This is particularly subtle because it leads to timings that are incorrect but also not obviously wrong.

Proposal

I propose we add the following function to the testing package:

package testing

// Keep returns its argument. It ensures that its argument and result
// will be evaluated at run time and treated as non-constant.
// This is for use in benchmarks to prevent undesired compiler optimizations.
func Keep[T any](v T) T

(This proposal is an expanded and tweaked version of @randall77’s comment.)

The Keep function can be used on the result of a function under test, on arguments, or even on the function itself. Using Keep, the corrected version of the example would be:

func BenchmarkFib10(b *testing.B) {
    // run the Fib function b.N times
    for n := 0; n < b.N; n++ {
        testing.Keep(Fib(testing.Keep(10)))
    }
}

(Or testing.Keep(Fib)(10), but this is subtle enough that I don’t think we should recommend this usage.)

Unlike various other solutions, Keep also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.

Alternatives

Keep may not be the best name. This is essentially equivalent to Rust’s black_box, and we could call it testing.BlackBox. Other options include Opaque, NoOpt, Used, and Sink.
27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.
48768 proposes testing.Iterate, which forces evaluation of all arguments and results of a function, in addition to abstracting away the b.N loop, which is another common benchmarking mistake. However, its heavy use of reflection would be difficult to make zero or even low overhead, and it lacks static type-safety. It also seems likely that users would often just pass a func() with the body of the benchmark, negating its benefits for argument and result evaluation.
runtime.KeepAlive can be used to force evaluation of the result of a function under test. However, this isn’t the intended use and it’s not clear how this might interact with future optimizations to KeepAlive. It also can’t be used for arguments because it doesn’t return anything. @cespare has some arguments against KeepAlive in this comment.

willfaught commented 1 year ago

The https://github.com/golang/go/issues/61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this.

@aclements Adding an hr divider at the end, followed by an "Edit: Changed to [...], see these comments [...]" line(s) would be sufficient. This is comparable to the practice of reserving the first comment for FAQs and proposal updates that I've seen the Go team use elsewhere recently, which worked well.

However, not exposing Keep limits refactoring opportunities

Can you demonstrate an example using Keep?

Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure. The former would not be constant-propagated, while the latter could be.

Can you demonstrate an example using Keep?

rsc commented 1 year ago

The auto-Keep during b.Loop could be applied the same in any loop that counts from 0 to b.N where b has type *testing.B. Then b.Loop is less special - the compiler just applies it to both kinds of benchmark loops.

rsc commented 1 year ago

If we make Keep auto-apply inside b.N loops, then b.Loop is no longer special, and converting a b.N loop to a b.Loop loop is not a performance change at all, so it seems like we should make Keep auto-apply inside b.N loops. That will also fix a very large number of Go microbenchmarks, although it will look like it slowed them down.

That leaves the question of whether to auto-Keep at all. If we don't, we leave a big mistake that users, even experienced ones, will make over and over. The compiler can do the right thing for us here.

Maybe it would help if someone could try a compiler implementation and see how difficult this is.

aclements commented 1 year ago

However, not exposing Keep limits refactoring opportunities

Can you demonstrate an example using Keep?

Suppose you have a function like

func Benchmark(b *testing.B) {
    for b.Loop() {
        ... complicated body ...
    }
}

In this form, we're proposing that we automatically apply Keep to "complicated body". However, if you were to refactor out parts of the complicated body into a new function:

func Benchmark(b *testing.B) {
    for b.Loop() {
        ... less complicated body, calls helper ...
    }
}

func helper() {
    ... parts of complicated body ...
}

The code in helper no longer gets Keep applied automatically, so the programmer would need to call Keep themself.

Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure.

Can you demonstrate an example using Keep?

I guess you don't actually need Keep to express partial optimization. Here's an example of partial optimization, where one argument to the benchmarked function can be constant-propagated, but the other is treated as a variable even though it's a constant.

func BenchmarkPowPartialConstant(b *testing.B) {
    xPow20 := func(a float64) float64 {
        return math.Pow(a, 20)
    }

    for b.Loop() {
        // Compute 5 ** 20. This allows the '20' to be constant-propagated,
        // but treats the '5' as a variable.
        xPow20(5)
    }
}

Another way to express this does use Keep:

func BenchmarkPowPartialConstant(b *testing.B) {
    xPow := func() float64 {
        return math.Pow(math.Keep(5), 20)
    }

    for b.Loop() {
        // Compute 5 ** 20. This is defined separately to avoid auto-Keep
        // so the '20' can be constant-propagated.
        xPow()
    }
}

In both of these, xPow returns the result of math.Pow, so auto-Keep in the loop keeps the result. Otherwise, the call to math.Pow could be eliminated entirely since it's pure.

meling commented 1 year ago

Here is an ergonomic thing that doesn't seem to have been considered in the above discussion. When the benchmarked function has multiple input parameters and return values (of different types):

for i := 0; i < b.N; i++ {
    bb, _ = bbhash.New(gamma, partitions, keys)
}

It would be nice to be able to write this...

for i := 0; i < b.N; i++ {
    testing.Keep(bbhash.New(gamma, partitions, keys))
}

However, since the return types may be different, the proposed Keep function won't work for the above scenario.

A variadic variant like the one below would work for the return types:

func Keep(v ...any) []any {
    return v
}

But it doesn't work for the input arguments if one wants to avoid testing.Keep for each argument. Maybe that's not a relevant concern?

zigo101 commented 1 year ago

Just curious: is it required to let Keep return values?

[edit] okay, it looks the answer is yes.

aclements commented 1 year ago

@meling , that's an interesting point. Unfortunately, while I agree that makes Keep a little more ergonomic for that one case, it seems to make it less ergonomic for any case that needs to use the result of Keep, which includes using it on arguments to functions and using the return value from functions.

For example, to fully write out your example with the signature you proposed, you'd need to write:

for i := 0; i < b.N; i++ {
    r := testing.Keep(bbhash.New(gamma, partitions, keys))
    bb = r[0].(something)
}

That type assertion will also add a little benchmarking overhead.

With the original proposed signature, you would write this as:

for i := 0; i < b.N; i++ {
    bb, _ = bbhash.New(gamma, partitions, keys)
    testing.Keep(bb)
}

That doesn't seem very bad to me.

Also, the idea with auto-Keep is that in these cases you wouldn't have to write Keep at all.

meling commented 1 year ago

Yeah, I agree. testing.Keep(bb) is definitely reasonable.

The main reason I brought it up was that it didn't seem like it was discussed. Moreover, I had some initial idea that perhaps it would be valuable to have some form of "type matching" (for lack of a better term):

func Keep[T any{0,3}](v T) T

Where the T would match 0-3 input and corresponding output types, which could be different types (not sure this notation makes that clear enough, though). Something like this could also be helpful for the iter.Seq definition being considered.

However, I removed it since I didn't want to distract the first message with such a "controversial" proposal 😅... Don't know if this is worth writing up as a separate proposal.

rsc commented 1 year ago

Placed on hold. — rsc for the proposal review group

aclements commented 1 year ago

To be clear, this is on hold pending an experimental implementation, per https://github.com/golang/go/issues/61179#issuecomment-1728164629

randall77 commented 12 months ago

Just to loop back here, the proposal is to both:

Add a function Keep to the testing package
Auto-add calls to Keep in some situations

Number 1 seems pretty easy. Number 2 could be pretty difficult. This proposal is on hold for someone to prototype 2.

The question is, would we want to do 1 without also doing 2? Particularly, I think we could do 1 for 1.22 if we want. Getting 2 in for 1.22 looks more speculative.

But then there is a problem: having 1 and not 2 will encourage people to add lots of explicit testing.Keep calls that are unnecessary once 2 lands (because they would have been auto-added). So it sounds like a recipe for unnecessary churn. Hence, my thought is we hold on 1 as well until 2 is ready.

eliben commented 12 months ago

But then there is a problem: having 1 and not 2 will encourage people to add lots of explicit testing.Keep calls that are unnecessary once 2 lands (because they would have been auto-added). So it sounds like a recipe for unnecessary churn. Hence, my thought is we hold on 1 as well until 2 is ready.

Isn't this the case for runtime.KeepAlive already today? Benchmarks can call it or just use custom sink solutions to thwart optimizations.

IMHO we can do (1) first, in 1.22. Whatever calls to testing.Keep people will add is likely to fix many benchmarks, which is a positive development overall. Unless we plan to do (2) without doing (1) at all (exposing it to users), I vote for (1) in 1.22

randall77 commented 12 months ago

Isn't this the case for runtime.KeepAlive already today?

It is for function results, true. It can't be used to solve the function arg problem.

cespare commented 12 months ago

I'm concerned about losing momentum on this. Is there someone who is planning on doing the prototype work for auto-Keep?

If it is going to be years before we get auto-Keep, I'd much rather we decouple these and land Keep now. Then we at least have a path forward to eliminate the hacky workarounds (KeepAlive, sinks) while we wait for the better thing which may or may not happen.

eliben commented 12 months ago

Isn't this the case for runtime.KeepAlive already today?

It is for function results, true. It can't be used to solve the function arg problem.

Yes, my argument isn't that runtime.KeepAlive gives us what we want; on the contrary, I'm arguing that the problem of "people will just splat Keep all over the place" already exists today with runtime.KeepAlive, so I don't think this is a good reason to delay shipping testing.Keep as soon as we can.

rsc commented 11 months ago

@eliben and @cespare, I hear your concern, but there are only a few weeks left in the Go 1.22 cycle, Austin is away, and we don't have consensus on the actual design yet, nor an implementation. This will have to wait for Go 1.23.

jdemeyer commented 2 days ago

In rare cases, this affects test output also, where optimizations in the test code cause subtle changes in floating-point output. This is a reduced test case from a much larger program:

package main

import (
    "fmt"
    "runtime"
)

func keep[T any](x T) T {
    runtime.KeepAlive(&x)
    return x
}

func main() {
    fmt.Printf("%g\n", foo(keep(0.85)))
    fmt.Printf("%g\n", foo(0.85))
}

func foo(x float64) float64 {
    return 3 * x - 2
}

On ARM64, this outputs two different numbers (depending on whether keep is used):

0.5499999999999999
0.5499999999999998

jdemeyer commented 2 days ago

It can't be used to solve the function arg problem.

runtime.KeepAlive(&arg) seems to work as in the example above

golang / go

proposal: testing: add Keep, to force evaluation in benchmarks #61179

Proposal

Alternatives

27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.