golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.92k stars 17.65k forks source link

cmd/go: do not apply a kill timeout to 'go test' if -bench was set #69181

Open cespare opened 2 months ago

cespare commented 2 months ago

Go version

go version go1.23.0 linux/amd64

Output of go env in your module/workspace:

n/a

What did you do?

The code doesn't really matter, but for the sake of illustration suppose we have this benchmark:

func Benchmark100MS(b *testing.B) {
    for i := 0; i < b.N; i++ {
        time.Sleep(100 * time.Millisecond)
    }
}

(You can test this with any benchmark.) Consider the following go test invocations:

  1. go test -bench . -benchtime 70s -timeout 1s
  2. go test -c -o x.test ./x/x_test.go && ./x.test -test.bench . -test.benchtime 70s -test.timeout 1s

What did you see happen?

The tests behave differently. When I run the test using go test, it kills the test after 61s:

$ go test -bench 100MS  -benchtime 70s -timeout 1s ./x/x_test.go
goos: linux
goarch: amd64
cpu: AMD Ryzen 9 3900X 12-Core Processor
Benchmark100MS-24       SIGQUIT: quit
PC=0x475921 m=0 sigcode=0

goroutine 0 gp=0x66bdc0 m=0 mp=0x66cc60 [idle]:
runtime.futex(0x66cda0, 0x80, 0x0, 0x0, 0x0, 0x0)
...
*** Test killed with quit: ran too long (1m1s).
exit status 2
FAIL    command-line-arguments  61.073s
FAIL

When I run the test binary, the -test.timeout flag has no effect and the benchmark runs to completion:

$ go test -c -o x.test ./x/x_test.go && time ./x.test -test.bench . -test.benchtime 70s -test.timeout 1s
goos: linux
goarch: amd64
cpu: AMD Ryzen 9 3900X 12-Core Processor
Benchmark100MS-24            836         100261554 ns/op
PASS
./x.test -test.bench . -test.benchtime 70s -test.timeout 1s  0.03s user 0.08s system 0% cpu 1:33.95 total

What did you expect to see?

I would expect the program to behave the same way whether invoked via go test or compiled and run directly.

What's going on here is a bit of a disagreement between the testing package and go test.

The testing package applies the timeout to tests, fuzz tests, and examples, but not benchmarks. This is very much intentional; the old issue #18845 flagged a regression in this behavior and it was fixed in 470704531d93d1bcc24493abea882f99593bcac6. The regression test is still around: it checks that running a 1s benchmark using -timeout 750ms should pass.

go test sets its own timer to kill stuck tests. That timer is set unless -test.timeout is 0 (disabled) or if -test.fuzz is set. The go test timer is set to the value of the test timeout plus one minute, since it expects the test to time itself out normally.

So there is an inconsistency here.

I think we should resolve the inconsistency by changing go test not to time out test binaries if -test.bench was set, as it already does for -test.fuzz. The testing package clearly goes out of its way to avoid having -timeout apply to benchmarks; go test should respect that.

This certainly comes up in practice. I work on a pretty large Go codebase. The 10m default test timeout works fine for us; our tests are generally plenty fast. But when I run benchmarks across lots of packages (this is common as part of the sanity checks when doing Go upgrades, for example) I usually forget to set -timeout and then after 11m my benchmarks are killed. (The odd 11m number is the 10m default timeout value plus go test's extra minute.)

I noticed this when thinking about the related issue #48157 which is about adding per-test timeouts.

gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

ianlancetaylor commented 2 months ago

CC @matloob @samthanawalla

matloob commented 2 months ago

This sounds reasonable to me.

BatmanAoD commented 1 month ago

I want to increase the timeout for a benchmark test that is compiled separately before being run, but I can't actually figure out what timeout flag will accomplish this.

But when I run benchmarks across lots of packages... I usually forget to set -timeout and then after 11m my benchmarks are killed.

This implies that setting -timeout somewhere does indeed affect when the benchmarks are killed. But where?

Here's what I see, with Go 1.21:

go test ... -timeout 1ms -o <foo> -c
./<foo> -test.bench <slow benchmark>

The tests don't time out after 1ms or 1 minute, even if I reduce 1ms to 1ns. So I think the timeout flag in this position has no effect.

go test ... -o <foo> -c
./<foo> -test.timeout 1ms -test.bench <slow benchmark>

...as far as I can tell, the test runs to completion, or possibly would be cut off by the default 11 minute timeout (I am trying now with a Sleep to test this, because the tests run about 3x faster locally than they do in CI, so I haven't reached the timeout yet while experimenting). However:

go test ... -o <foo> -c
./<foo> -test.timeout 1us -test.bench <slow benchmark>

This (and 1ns) times out immediately. At first I thought this meant that the 1ms timeout was being applied to each individual test, but that doesn't seem to be the case either, because I saw timings of up to 249391542 ns/op which is far longer than 1ms. So it almost seems like the timeout is always set, and if it triggers soon enough, the test exits, but otherwise the test binary cancels or ignores the timeout somehow.

And finally, just for completeness:

go test ... -o <foo> -c
./<foo> -timeout <any> -test.bench <slow benchmark>

The test binary doesn't recognize the flag.

....so, having looked at this issue and all the ones linked above, I still don't understand what is going on or how to actually adjust the timeout.

ianlancetaylor commented 1 month ago

@BatmanAoD As you know, running go test -c builds a new binary. You are correct that the timeout flag is ignored in that case.

The way to set the timeout is to set it when you run the newly built binary, using <foo> -test.timeout=1h.

This has nothing to do with this issue, so please do not follow up on this issue, but instead use a forum. See https://go.dev/wiki/Questions. Thanks.

BatmanAoD commented 1 month ago

@ianlancetaylor Sorry for not making a more concise message originally, but my description of the behavior is related: it demonstrates an exception to the behavior described in the top comment.

When I run the test binary, the -test.timeout flag has no effect and the benchmark runs to completion

This is true with most timeout values, but using a sufficiently short timeout, such as 1us, causes the tests to timeout immediately even when running a compiled test binary and using -test.run '^$' -test.bench ....

cespare commented 1 month ago

@BatmanAoD The testing package starts a timeout timer, runs the tests and examples and whatnot, then stops the timer, and then runs benchmarks. (The text of my original post explains this as well.)

If your timeout is reached before the benchmarks start running, therefore, it will kill the test.

Even if you have no tests, or if you skip them by using flags such as -test.run '^$' -test.bench ., a sufficiently silly timeout value such as 1ns can cause the timeout timer to fire.

Anyway, I consider all of this basically working as intended and orthogonal to the issue that I'm raising here, so I agree with Ian that you should move further discussion to a better forum. I'll go ahead and hide these comments too.