proposal: testing: add Keep, to force evaluation in benchmarks

aclements commented 1 year ago

Benchmarks frequently need to prevent certain compiler optimizations that may optimize away parts of the code the programmer intends to benchmark. Usually, this comes up in two situations where the benchmark use of an API is slightly artificial compared to a “real” use of the API. The following example comes from @davecheney's 2013 blog post, How to write benchmarks in Go, and demonstrates both issues:

func BenchmarkFib10(b *testing.B) {
    // run the Fib function b.N times
    for n := 0; n < b.N; n++ {
        Fib(10)
    }
}

Most commonly, the result of the function under test is not used because we only care about its timing. In the example, since Fib is a pure function, the compiler could optimize away the call completely. Indeed, in “real” code, the compiler would often be expected to do exactly this. But in benchmark code, we’re interested only in the side-effect of the function’s timing, which this optimization would destroy.
An argument to the function under test may be unintentionally constant-folded into the function. In the example, even if we addressed the first issue, the compiler may compute Fib(10) entirely at compile time, again destroying the benchmark. This is more subtle because sometimes the intent is to benchmark a function with a particular constant-valued argument, and sometimes the constant argument is simply a placeholder.

There are ways around both of these, but they are difficult to use and tend to introduce overhead into the benchmark loop. For example, a common workaround is to add the result of the call to an accumulator. However, there’s not always a convenient accumulator type, this introduces some overhead into the loop, and the benchmark must then somehow ensure the accumulator itself doesn’t get optimized away.

In both cases, these optimizations can be partial, where part of the function under test is optimized away and part isn’t, as demonstrated in @eliben’s example. This is particularly subtle because it leads to timings that are incorrect but also not obviously wrong.

Proposal

I propose we add the following function to the testing package:

package testing

// Keep returns its argument. It ensures that its argument and result
// will be evaluated at run time and treated as non-constant.
// This is for use in benchmarks to prevent undesired compiler optimizations.
func Keep[T any](v T) T

(This proposal is an expanded and tweaked version of @randall77’s comment.)

The Keep function can be used on the result of a function under test, on arguments, or even on the function itself. Using Keep, the corrected version of the example would be:

func BenchmarkFib10(b *testing.B) {
    // run the Fib function b.N times
    for n := 0; n < b.N; n++ {
        testing.Keep(Fib(testing.Keep(10)))
    }
}

(Or testing.Keep(Fib)(10), but this is subtle enough that I don’t think we should recommend this usage.)

Unlike various other solutions, Keep also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.

Alternatives

Keep may not be the best name. This is essentially equivalent to Rust’s black_box, and we could call it testing.BlackBox. Other options include Opaque, NoOpt, Used, and Sink.
27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.
48768 proposes testing.Iterate, which forces evaluation of all arguments and results of a function, in addition to abstracting away the b.N loop, which is another common benchmarking mistake. However, its heavy use of reflection would be difficult to make zero or even low overhead, and it lacks static type-safety. It also seems likely that users would often just pass a func() with the body of the benchmark, negating its benefits for argument and result evaluation.
runtime.KeepAlive can be used to force evaluation of the result of a function under test. However, this isn’t the intended use and it’s not clear how this might interact with future optimizations to KeepAlive. It also can’t be used for arguments because it doesn’t return anything. @cespare has some arguments against KeepAlive in this comment.

eliben commented 1 year ago

Bikeshedding the name aside (Keep SGTM), I really like this proposal.

It's simple and sufficient. It doesn't prevent us from working on a different API like the proposed Iterate in the future.

rsc commented 1 year ago

If we planned to do Iterate, we might not want to also do Keep. That said, I think the drawbacks listed above for Iterate are quite serious and we should simply not do it.

aclements commented 1 year ago

It doesn't prevent us from working on a different API like the proposed Iterate in the future.

FWIW, I'm planning to file another proposal for an API that covers just the looping aspect of Iterate and would complement Keep.

earthboundkid commented 1 year ago

Doing Iterate would cause a lot of churn as all the old benchmarks are rewritten to be in the new style. They would have different nesting depths, which make the diffs harder to ignore. Adding testing.Keep (or should it be b.KeepAlive()?) would cause minimal churn as just a bunch of globalSinks and runtime.KeepAlives would be replaced and only those specific lines would be affected.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

bcmills commented 1 year ago

Unlike various other solutions, Keep also lets the benchmark author choose whether to treat an argument as constant or not, making it possible to benchmark expected constant folding.

Note that that is also possible with the API proposed in #48768 by closing over the intentionally-constant values:

func BenchmarkFibConstant10(b *testing.T) {
    b.Iterate(func() int {
        return Fib(10)
    })
}

It seems to me that this proposal and #48768 are equally expressive, and the key difference is just whether constant-propagation is opt-in (#48768) or opt-out (this proposal).

bcmills commented 1 year ago

However, [Iterate's] heavy use of reflection would be difficult to make zero or even low overhead

As I have repeatedly stated on #48768, I believe that there are several viable ways to overcome that overhead.

I am becoming somewhat frustrated that https://github.com/golang/go/issues/48768#issuecomment-937003496 in particular seems to have been ignored. I may not be on the Go compiler team, but I am well acquainted with compiler optimization techniques, and so far nobody has explained why those techniques would not apply in this case.

bcmills commented 1 year ago

and it lacks static type-safety.

While that is true, any type mismatch errors would be diagnosed immediately if the benchmark is ever actually run, and a similar lack of type safety was not seen as a significant barrier for the closely-related fuzz testing API (#44551).

rsc commented 1 year ago

@bcmills, my experience over >25 frustrating years of trying to benchmark things is that, in general, attempting to subtract out per-loop overhead sounds good in theory, but in practice that overhead can and often does include various random noise. And the more overhead there is, the louder the noise. This means if you are trying to benchmark a very short operation, then subtracting out a (different) reflect.Call measurement is very likely to destroy the measurement, perhaps even making it negative. The best approach we have for getting the most reliable numbers we can is to introduce as little overhead as possible to begin with.

For the trivial loop for i := 0; i < b.N; i++, we just ignore the overhead of the i++, i < N entirely and include it as part of the thing being measured. This turns out to be far more accurate than trying to subtract it out.

bcmills commented 1 year ago

  testing.Keep(Fib(testing.Keep(10)))

From what I can tell, this would require N+1 calls to Keep in order to benchmark a function with N arguments. Although N is usually fairly small, that still seems like a very noisy call site for even a modest number of arguments.

rsc commented 1 year ago

The main place where testing.Keep is needed is around the overall result. I write code to work around that all the time. It is far less common to need to worry about making the arguments opaque. I can't remember ever doing that.

rsc commented 1 year ago

I see now that you also mentioned making b.Iterate a compiler intrinsic. I suppose that is possible, but it seems very special-case. At that point it's basically a back-door language change, since either you can't do x := b.Iterate; x(Fib, 10) or it does something very different from b.Iterate(Fib, 10).

bcmills commented 1 year ago

It is far less common to need to worry about making the arguments opaque. I can't remember ever doing that.

I expect that that will become more common as the compiler gets better at inlining. That said, it is also more straightforward to work around (without new API) today, such as by alternating among multiple entries in a slice of inputs.

bcmills commented 1 year ago

I agree that subtracting out the overhead from a naive implementation based on reflect.Call does not seem viable.

Making b.Iterate itself a compiler intrinsic is one possible alternative, although I agree that the implication for Iterate as a method-value is unfortunate.

I think probably the most promising approach is an implementation that sets up a stack frame with arguments and then repeatedly invokes the function starting from that frame. It isn't obvious to me whether the reflect.Caller API in #49340 is sufficient for that or if it would need some other hook, but even in that case the hook could be provided in internal/reflectlite or a similar internal package.

rsc commented 1 year ago

The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do. reflect.Caller would amortize the allocation but not the setup.

aclements commented 1 year ago

All good points, @bcmills.

From what I can tell, this would require N+1 calls to Keep in order to benchmark a function with N arguments. Although N is usually fairly small, that still seems like a very noisy call site for even a modest number of arguments.

I'm not sure if you're referring to "line noise" here (which, I agree, this does introduce a fair amount of line noise) or measurement noise. For the latter, a naive implementation of testing.Keep will introduce just a CALL/RET pair, and that we could easily optimize away either by making the compiler recognize no-op functions, or by making it recognize this particular function. Intrinsify-ing this function seems more palatable than intrinsify-ing Iterate, though that's just my opinion.

Making b.Iterate itself a compiler intrinsic is one possible alternative, although I agree that the implication for Iterate as a method-value is unfortunate.

Another possible option is that we make sure b.Iterate can be inlined and then teach the compiler how to eliminate a reflect.Call of a statically-known function. That feels less "special" than teaching it about b.Iterate. I'm not sure this is a good option, though, since it would also have to figure out the loop that sets up the reflect.Value arguments, and have to deal with the type-checking that reflect would be doing at run-time.

I'm not that concerned about people capturing b.Iterate (or any alternative) as a method value.

We already do some code generation for tests. Is there anything we could code-generate to help with this? We don't rewrite any test code right now, so this might require pushing that too far.

The stack frame implementation would not be able to set up the arguments just once. It would have to set them up on every iteration, since in general a function can overwrite its arguments, and many do.

Not to mention, I would expect most or all of the arguments to be passed in registers. We would certainly have to re-initialze those.

bcmills commented 1 year ago

What I had in mind is something like two functions: a (somewhat expensive) setup hook that checks types and copies function arguments from a slice into a more compact argument block, and a (cheap) “invoke” hook that initializes the call stack frame, copies arguments into registers, and calls the function.

The argument block might look something like:

+------------------------------------------------+
| pointer to GC map for argument block           |
+------------------------------------------------+
| function address                               |
+------------------------------------------------+
| closure address                                |
+------------------------------------------------+
| # of integer argument registers                |
+------------------------------------------------+
| # of FP argument registers                     |
+------------------------------------------------+
| spill + stack-assigned result area size        |
+------------------------------------------------+
| stack-assigned argument area size              |
+------------------------------------------------+
| integer register arguments                     |
|          ...                                   |
+------------------------------------------------+
| FP register arguments                          |
|          ...                                   |
+------------------------------------------------+
| stack-assigned arguments                       |
|          ...                                   |
+------------------------------------------------+

The implementation of iterate would be something like:

func (b *B) Iterate(f any, args ...any) {
    call := reflectlite.NewCall(f, args...)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        call.Invoke()
    }
}

where call.Invoke() is similar in cost to a non-inlined function call, but perhaps with a couple of extra jumps to deal with argument-register counts that aren't compile-time constants.

That seems like it might be easier than teaching the compiler to inline through reflect.Call proper, but still a lot less run-time overhead than reflect.Call. (And it still leaves the door open for the inliner to figure out call.Invoke without trying to reason its way through all of the type-checking code that precedes it.)

rittneje commented 1 year ago

As a developer I would much prefer the compiler be taught not to optimize away calls within a _test.go file instead of me having to remember to write a bunch of wrapper calls. I didn't see that listed in the alternatives, so my apologies if that has been proposed previously.

seebs commented 1 year ago

So I naively want the compiler not to optimize away things in a benchmark... but also some amount of the optimization happening would in fact be part of what the compiler would do running the code in reality, and thus, part of what I want to benchmark. The trick is distinguishing between optimizing-away the benchmark and optimizing-away some of the work inside the benchmark, which would also be optimized-away outside of the benchmark.

zigo101 commented 1 year ago

Another alternative name for Keep is eval.

This function is not only useful for testing, but also for non-testing code.

mateusz834 commented 1 year ago

I belive that another problem with testing.Iterate() would be with escape analysis. Right? It would cause heap allocations when returning pointer types, so it might cause bad benchmarking results.

bcmills commented 1 year ago

It would cause heap allocations when returning pointer types, so it might cause bad benchmarking results.

The existing ABI is such that if a function that returns a pointer to a new object is not inlined into the caller, the object to which it points must be heap-allocated. But that is part of the cost of calling the function; if you want to test how the function performs when it is fully inlined, you should benchmark an outer function that calls it in an inlineable way.

rsc commented 1 year ago

I am still concerned about the overhead of reflect in Iterate. We can't subtract it, and that means we can't reliably measure very short functions - which are the ones most likely to be affected by throwing away parts of the computation.

The compiler is going to be involved no matter what. What if it's more involved? Specifically, suppose we have a function that just does looping and takes func(), like

b.Loop(func() { Fib(10) })

or maybe

for range b.Loop {
   Fib(10)
}

and the compiler would recognize testing.B.Loop and apply Keep to every function and every argument in every call in that closure. We could still provide Keep separately and explain in the docs for Loop what the compiler is doing, in terms of Keep.

This would end up being like b.Iterate but (a) you get to write the actual calls, and (b) there is no reflect. On the other hand, the compiler does more work. But this is the Go compiler and the Go testing package; the compiler already knows about sync/atomic and math and other packages.

For that matter we could also recognize for i := 0; i < b.N; i++ { ... } and do the same to that loop body (it might still help to have something like Iterate or Loop though).

aclements commented 1 year ago

I've just filed #61515, which I consider closely related to and complementary to this proposal and the Iterate proposal. I suggest we keep discussions of trade-offs between Iterate, Keep, and Loop in this issue.

cespare commented 1 year ago

To quote #61515:

b.Loop could be a clear signal to the compiler not to perform certain optimizations in the loop body that often quietly invalidate benchmark results.

To be explicit, does that proposal include this compiler special-casing? Or is that proposal only for the API change and then in the future we will take a separate decision regarding special compilation of Loop?

You say that that proposal is complementary to this Keep proposal, but this specific change seems non-orthogonal. If we decide to do the compiler special-casing, that seems like it should bear on our decision about whether to expose Keep to the user at all.

rsc commented 1 year ago

We are considering Keep and Loop-with-implicit-Keep together. (And the place for that combined discussion is this issue, not #61515.)

If we do the special casing, then we basically have to expose Keep too, so that we can explain what the special casing does and provide a way for users to have that same power themselves.

rsc commented 1 year ago

To summarize the current state, the idea is to have Keep(x) return x but "hide" it from the compiler and disable throwing it away, so you can use Keep(f(Keep(x))) to both make sure f's result calculation is not optimized away and to keep the compiler from specializing an inlined copy of f to handle just x.

Then, over on #61515, we have a proposal to define b.Loop() that returns bool and is used like:

for b.Loop() { ... }

instead of

for i := 0; i < b.N; i++ { ... }

The nice thing about b.Loop is that the testing package can run code inside b.Loop to time groups of iterations separately, so that for example b.Loop could return true 10 times and see how long those iterations took, and then return true 100 more times and see how long those took, all without breaking the loop. This would remove the need to call a benchmark function more than once, and it would remove the need for b.ResetTimer - the only timing would be while the for loop is running. Setup and teardown would automatically not be counted.

And then on top of that, the compiler would recognize a for loop around b.Loop() and edit any calls inside the { ... } loop body to insert Keep around the result of the call and each argument.

With all that, a working, accurate benchmark for, say, unicode.IsSpace, would be:

func BenchmarkIsSpace(b *testing.B) {
    setup() // no setup really needed here but in general...
    for b.Loop() {
        unicode.IsSpace('x')
    }
    teardown() // same...
}

When users learn the pattern of using b.Loop, their benchmarks are easier to write and report real numbers.

This would be rewritten by the compiler to:

func BenchmarkIsSpace(b *testing.B) {
    setup() // no setup really needed here but in general...
    for b.Loop() {
        testing.Keep(unicode.IsSpace(testing.Keep('x')))
    }
    teardown() // same...
}

It might be better to rename Keep to Use too, but for clarity I've written this comment with Keep.

bcmills commented 1 year ago

So, if we wanted to benchmark inlining of unicode.IsSpace with the constant argument 'x', I guess that would be written as:

func BenchmarkIsSpaceInlinedConstant(b *testing.B) {
    setup() // no setup really needed here but in general...

    // Enable inlining by defining this outside of the b.Loop body.
    xIsSpace := func() bool {
        return unicode.IsSpace('x')
    }

    for b.Loop() {
        // Benchmark the call with the 'x' argument inlined.
        xIsSpace()
    }

    teardown() // same...
}

?

bcmills commented 1 year ago

I'm still not real fond of the “b.Loop body is compiler magic” aspect of that approach, but I will admit that that's just an aesthetic preference, and at some point we're dealing with compiler magic no matter how we slice it. 🤷‍♂️

aclements commented 1 year ago

So, if we wanted to benchmark inlining of unicode.IsSpace with the constant argument 'x', I guess that would be written as:

I believe your example is right. It's awkward, but I think the only way to make it non-awkward is what we have today. Given that I'm pretty sure intentional constant propagation in benchmarks is extremely rare, it seems like the right balance to make the common intent (no constant propagation) the easy default, at the expense of making the rare intent awkward.

I'm still not real fond of the “b.Loop body is compiler magic” aspect of that approach

I'm a bit squeamish about this, too. But, it seems to me that there are the users who think about unintended optimization in benchmarks, and the users who don't. With a little compiler magic, we can just solve this problem for the users who don't think about it. And for the users who do think about it, hopefully they can also learn about the deoptimization effect of b.Loop. Also, there's no harm in continuing to do the sorts of "manual" deoptimization that people do today. My main concern is that refactoring of code within a b.Loop could have surprising effects on the result of a benchmark. I think attaching it to b.Loop is a lot more robust to refactoring than, say, deoptimizing the body of Benchmark functions, though.

rsc commented 1 year ago

It seems like the choice generally is between "thing people forget to use" and "thing that is kind of magic". Given that choice it seems like we should prefer the second. Or at least I prefer the second, because it will mean that benchmarks are more reliable.

There is no perfect solution here. We have to pick one of those two choices.

I agree with the example above but I would have written

func BenchmarkIsSpaceInlinedConstant(b *testing.B) {
    setup() // no setup really needed here but in general...
    for b.Loop() {
        func() {
            unicode.IsSpace('x')
        }()
    }
    teardown() // same...
}

I suspect that will become a pattern, and it seems fine.

bcmills commented 1 year ago

I don't really understand why

    for b.Loop() {
        func() {
            unicode.IsSpace('x')
        }()
    }

would work to enable the optimization — it is still lexically within the loop body.

If it does work, then I'm not sure I can clearly describe the region within which optimizations are disabled. 😅

rsc commented 1 year ago

Good point @bcmills. We need a precise definition of what gets Keep added inside the loop body.

It sounds like maybe we want "Keep is applied around all function results and all function arguments appearing anywhere lexically inside the loop body", so that F(10) becomes Keep(F(Keep(10)). So my "idiom" would not in fact become an idiom, or at least it would not do anything useful.

An alternative would be to apply it around all expressions, so that F(10) becomes Keep(Keep(F)(Keep(10))), but we probably don't want that, because often we do want F itself to be inlined in the benchmark if it would be inlined at the call site.

rsc commented 1 year ago

It sounds like maybe we want "Keep is applied around all function results and all function arguments appearing anywhere lexically inside the loop body",

Do I have that right? Do people agree with this?

aclements commented 1 year ago

Do I have that right? Do people agree with this?

This seems reasonable to me.

(That said, while I think it's important to agree on a definition we can implement and communicate, I don't feel like the exact details matter that much. I think any reasonable definition will work for the vast majority of code, and in the unusual cases where the details matter, users can follow whatever definition we provide.)

willfaught commented 1 year ago

Wouldn't it be simpler to just disable the appropriate optimizations when compiling benchmark functions (not including the functions they call)? I think most users would accept worse emitted code for these funcs knowing that it was for better benchmark accuracy. Then we wouldn't need a special stdlib function with documentation explaining its purpose, etc. If you need better emitted code for some parts of the benchmark for some reason, then wrap it in a helper function, which is optimized like normal:

func BenchmarkFoo(b *testing.B) {
    lotsOfSetupWork(b)
    for n := 0; n < b.N; n++ {
        Fib(10)
    }
    lotsOfTeardownWork(b)
}

aclements commented 1 year ago

@willfaught , we've leaned away from that for two reasons:

I believe it's not uncommon (though I don't have data) for the b.N loop to be factored out of the Benchmark function itself, whereas I believe the actual body of the benchmark is basically always lexically within the loop.
It may cause surprising changes in existing benchmarks. Those benchmarks are perhaps already doing the wrong thing, so it may be we should just take the pain of a one-time change.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

rittneje commented 1 year ago

@rsc I'm surprised and disappointed to see this proposal being accepted. On many other proposals, it has been argued that we should prefer making the compiler smarter to fix the problem for everyone retroactively, instead of adding new functions people have to go rewrite all their code to go use (e.g., strings.Compare).

willfaught commented 1 year ago

I believe it's not uncommon (though I don't have data) for the b.N loop to be factored out of the Benchmark function itself, whereas I believe the actual body of the benchmark is basically always lexically within the loop.

@aclements I don't follow. What does it mean for the loop to be factored out of the benchmark func? Which part of the benchmark func do you mean by "actual body of the benchmark"?

It may cause surprising changes in existing benchmarks. Those benchmarks are perhaps already doing the wrong thing, so it may be we should just take the pain of a one-time change.

Go benchmarks don't have a history that the Go tool compares results against (which would be a great feature), so I don't see the issue. We don't worry about compiler improvements throwing off benchmark results, as far as I know. And as I pointed out, there's a workaround for getting back optimizations for setup/teardown code.

I'm surprised and disappointed to see this proposal being accepted.

I'm surprised as well. This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go?

A function approach seems more consistent with how Go does things:

func BenchmarkFib10(b *testing.B) {
    b.Do(func() { Fib(10) })
}

where the compiler is free to disable the appropriate optimizations inside the func literal for b.Do.

cespare commented 1 year ago

@willfaught FWIW I have a ton of code that has complex benchmarks like

func BenchmarkFoo(b *testing.B) {
    for _, bb := range []struct{
        name string
        /* lots of testing parameters */
    } {
        { /* test case 1 */ },
        // ...
    } {
        // lots of setup code
        b.Run(bb.name, func(b *testing.B) {
            benchFoo(b, bb.x, bb.y, some, other, params)
        })
    }
}

func benchFoo(b *testing.B, x, y, z int) {
    // ...
    for i := 0; i < b.N; b++ {
        // ...
    }
}

IOW, my experience lines up with with what @aclements said: it's common that the b.N loop isn't lexically inside the BenchmarkXxx function, but the thing I want to measure is always lexically inside the b.N loop.

This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go?

A function approach seems more consistent with how Go does things: [...] where the compiler is free to disable the appropriate optimizations inside the func literal for b.Do.

Your b.Do just sounds like b.Loop from #61515, which is I guess part of this proposal as well now. Loop is described and implemented in terms of Keep. So it sounds like what you said boils down to you would like Loop but without exposing Keep. Is that right?

If so, then I guess I'd point you at @rsc's comment when I asked about this:

If we do the special casing, then we basically have to expose Keep too, so that we can explain what the special casing does and provide a way for users to have that same power themselves.

aclements commented 1 year ago

@aclements I don't follow. What does it mean for the loop to be factored out of the benchmark func? Which part of the benchmark func do you mean by "actual body of the benchmark"?

I mean that the b.N loop often doesn't appear directly inside a Benchmark function, like in this example:

func Benchmark(b *testing.B) {
    b.Run("1", func(b *testing.B) { f(b, 1) })
    b.Run("2", func(b *testing.B) { f(b, 2) })
}

func f(b *testing.B, arg int) {
    for i := 0; i < b.N; i++ {
        // .. do something ..
    }
}

Factoring the b.N loop out of Benchmark and into a helper function would change the behavior of this benchmark if we apply "implicit Keep" only to Benchmark functions. If instead we apply implicit Keep to b.Loop, this sort of refactoring has no effect on the benchmark results.

Of course, it's also possible that the b.N loop is in a Benchmark function but the body of it gets factored into another function. In that case, neither recognizing Benchmark functions or b.Loop will help.

Another option would be that the compiler recognizes a loop over testing.B.N and deoptimizes that, wherever it appears. To me, that feels more magic than deoptimizing a b.Loop loop, but also deoptimizing b.Loop loops doesn't preclude deoptimizing testing.B.N loops. We could do both.

@rsc is planning to gather some data on how often the b.N loop is factored out of the Benchmark function. He had to analyze through this for the vet check from #38677, and said that it seemed to be pretty common, but he'll get hard data on that.

(Haha, looks like @cespare beat me to this point by 2 minutes. :smile:)

Go benchmarks don't have a history that the Go tool compares results against

I believe comparing across time is one of the most common uses of benchstat. We regularly get reports of things that have slowed down from one release to another. On the Go team, we certainly do this all the time with https://perf.golang.org/dashboard/.

I'm surprised as well. This doesn't seem very Go-like. Is there a precedent of taking this special-function-wrapped-around-value optimization approach before in Go?

There isn't precedent in Go for any of these approaches. I could see an argument for not doing any implicit Keep/benchmark deoptimization because any such approach is too implicit, but I think that's not the argument you're making.

A function approach seems more consistent with how Go does things

This is what I originally proposed in #61515. However, it's harder to eliminate the overhead of that for very short benchmarks (certainly not impossible, but it requires more inlining and more complex inlining). This also opens the possibility of passing something that isn't just a function literal, in which case we definitely wouldn't be able to deoptimize the loop body.

zigo101 commented 1 year ago

It is some weird that the Keep function is added to the testing package. At least it should be put in the runtime package. Building it in is even better.

ianlancetaylor commented 1 year ago

The point of testing.Keep is to force evaluation in a benchmark. It's not expected to have much application outside of benchmarking code. So there doesn't seem to be an obvious reason to put it in the runtime package.

zigo101 commented 1 year ago

What is the effect of calling testing.Keep in non-test files? Same as in test files or not?

ianlancetaylor commented 1 year ago

It should be the same whether it is in a test file or not.

zigo101 commented 1 year ago

So it is a general purpose function in syntax/semantics, but a testing specific function subjectively. Not a big problem though.

willfaught commented 1 year ago

@cespare @aclements Thanks for the explanations.

Your b.Do just sounds like b.Loop from https://github.com/golang/go/issues/61515, which is https://github.com/golang/go/issues/61179#issuecomment-1647160963. Loop is described and implemented in terms of Keep.

I agree it's the same as #61515, although the Loop in this proposal seems to be different, not taking a function, and returning a boolean.

Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.

So it sounds like what you said boils down to you would like Loop but without exposing Keep. Is that right?

Yes, assuming you mean the Loop from #61515, and not the Loop here, as explained just above.

If so, then I guess I'd point you at https://github.com/golang/go/issues/61179#issuecomment-1647160963 when I asked about this:

If we do the special casing, then we basically have to expose Keep too, so that we can explain what the special casing does and provide a way for users to have that same power themselves.

Why do we need to expose Keep to explain what Loop is doing? I don't see why we can't explain what Loop does in the same way, e.g. "All function values, all arguments, and all function results are forced to be evaluated etc etc etc..." Why do users need this general power?

What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type type BenchFunc func(), with func (*B) Loop(BenchFunc). Then

func Benchmark1(b *testing.B) {
    b.Loop(func() { Fib(10) }) //  Not optimized
}

func Benchmark2(b *testing.B) {
    var notOptimized testing.BenchFunc = func() { Fib(10) }
    var optimized func() = func() { Fib(10) }
    b.Loop(notOptimized)
    b.Loop(testing.BenchFunc(optimized))
}

work as expected. @cespare's example would be

func BenchmarkFoo(b *testing.B) {
    for _, bb := range []struct{
        name string
        /* lots of testing parameters */
    } {
        { /* test case 1 */ },
        // ...
    } {
        // lots of setup code
        b.Run(bb.name, func(b *testing.B) {
            benchFoo(b, bb.x, bb.y, some, other, params)
        })
    }
}

func benchFoo(b *testing.B, x, y, z int) {
    // ...
    b.Loop(func() {
                Foo(x, y, z)
        }
}

aclements commented 1 year ago

Regarding that change, can this proposal be updated to include that? It's difficult to track the current state of the proposal by piecing together all the comments.

The #61515 proposal does include a pointer to the latest version in the top post. We tend not to do significant rewrites of the top post in a proposal because then it makes it hard to follow the conversation that follows it, and instead add updates to it linking to the comment explaining the latest version. There's no really ideal way to do this. It may be that the way I wrote the update to #61515 wasn't clear enough, so I've tried to rewrite it.

Why do we need to expose Keep to explain what Loop is doing?

You're right that we can explain how Loop deoptimizes without exposing Keep. However, not exposing Keep limits refactoring opportunities, and also makes it impossible to write examples the allow partial optimization like in @bcmills' comment. Granted, we expect both of these situations to be rare.

What if we limit the disabled optimizations to just func literals that are assigned to a new package testing type type BenchFunc func(), with func (*B) Loop(BenchFunc).

This seems strictly more complicated to me.

Earlier you argued that "A function approach seems more consistent with how Go does things", but I'm not sure I agree with that. Go APIs tend not to reach for closures when simpler and more direct constructs will do. For example, for b.Loop() { ... } makes it clear even if you don't know what b.Loop is that we're going to execute the code in the body of the loop some number of times, and it can't involve any potentially complicated capture or scoping. Passing a closure, on the other hand, enforces no constraints on how or when that closure may be invoked, or the capture behavior of state used by that closure.

aclements commented 1 year ago

examples the allow partial optimization like in @bcmills' comment

Oops, I guess his example doesn't technically show partial optimization since there's only one argument to the function under test. Partial optimization would mix one (or more) argument passed in the b.Loop body to the intermediate closure with or (or more) argument passed directly from the intermediate closure. The former would not be constant-propagated, while the latter could be.

golang / go

proposal: testing: add Keep, to force evaluation in benchmarks #61179

Proposal

Alternatives

27400 asks for documentation of best practices for avoiding unwanted optimization. While we could document workarounds, the basic problem is Go doesn’t currently have a good way to write benchmarks that run afoul of compiler optimizations.