cmd/go: add per-test timeouts

rsc commented 3 years ago

Tests have an overall timeout for the entire binary but no timeout for a specific test case. You often want to limit any particular test case to a time much shorter than the overall binary. I propose to add the concept of per-test (function) timeouts to the go command user experience as follows.

Each test gets a per-test timeout. The timer for a given test only ticks down when the test is running. It does not tick down when the test is blocked in t.Parallel, nor when it is blocked in t.Run running a subtest.
The default per-test case timeout is 1m (one minute). If the new -testtimeout flag is specified explicitly, then that sets a different default. If the -testtimeout flag is omitted but -timeout is specified explicitly, then that sets the default too. This way, if you have one really long test and use go test -timeout=30m, the per-case timeout doesn't kick in after 1 minute and kill it anyway.
There is a new testing.TB method SetTimeout(d time.Duration) that allows a test to set its own timeout. Calling SetTimeout does not reset the timer. If a test runs for 30 seconds and then calls t.SetTimeout(1*time.Second), it gets killed for having timed out. A timeout set this way is inherited by subtests. (They each have their own timer.)
When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

This change will help users generally, and it will also help fuzzing, because any individual invocation of a fuzz function will now have a testing.T with a 1-minute timer, providing for detection of inputs that cause infinite loops or very slow execution.

bcmills commented 3 years ago

When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

I suspect this would help (at least somewhat) with the use-cases motivating #39038, too.

cespare commented 3 years ago

This sounds great. Besides being an overall better experience when there's a hanging test, it may allow for simpler test code in some situations. Sometimes I've written tests which exercise concurrent code which, if it fails, is expected to hang/deadlock; then I spawn my own timer to fail the test if it hasn't completed after, say, 5s. Depending on whether the kill-the-whole-test behavior is acceptable, I could change these to simply use t.SetTimeout(5 * time.Second) instead.

cespare commented 3 years ago

Would there still be a 10m default -timeout?

I would prefer there not to be, since hitting that timeout without hitting a 1m per-test timeout seems like a signal that we have a giant test suite operating correctly, not that we have a hanging/broken test.

mvdan commented 3 years ago

So, as per my reading, -testtimeout or -timeout just set the per-test default timeout, so T.SetTimeout always takes precedence. Is that correct?

I think that's the most intuitive to the test author. If someone wants to constrain the overall run-time of tests, they'd use the global -timeout, presumably.

If the -testtimeout flag is omitted but -timeout is specified explicitly, then that sets the default too.

I understand the reasoning behind this, but I'm also slightly uneasy about it. Many CIs currently use lines like go test -timeout=30m ./...; that means they would get zero net benefit from this proposal, unless they noticed they'd need to change to go test -timeout=30m -testtimeout=1m ./....

When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that.

Could you expand on the reason why we can't do better?

As a separate point; would https://pkg.go.dev/testing#T.Deadline be updated to take into account the timeout of the running test too? I assume it would be, but just clarifying since this proposal doesn't say.

Overall I really like this idea, for the reasons Bryan and Caleb mentioned too. I think my only main worry is that setting a per-test timeout with T.SetTimeout would be hard. Some machines are one or even two orders of magnitude faster than others in terms of CPU, disk, or network, especially when you take into account busy machines, like public shared CI runners.

This is not a problem for the -timeout or -testtimeout flags; they can be changed depending on the context. I can use -testtimeout=1m on my laptop, and -testtimeout=3m on the public CI, for example. However, T.SetTimeout is fairly static and can't be adjusted depending on the environment, as far as I can see.

Maybe we could also think about "testing timeout scales"; so I could run my tests in a way that all per-test timeouts are tripled if I know my machine is significantly slower than an average laptop (e.g. public CI). I think Go's builders do something similar, where very slow computers such as small ARM/RISCV boards get many timeouts multiplied to avoid benign timeouts.

rogpeppe commented 3 years ago

When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.

Would the failure include all messages logged up until the timeout?

rsc commented 3 years ago

I understand the reasoning behind this, but I'm also slightly uneasy about it. Many CIs currently use lines like go test -timeout=30m ./...; that means they would get zero net benefit from this proposal,

Zero net benefit, but also zero breakage. It's hard to have the latter without the former. If those same CIs have individual test cases that run for 10 minutes, I'm trying to avoid breaking them. Or do you think it's OK to break them? (Honest question.)

If we were starting from scratch it would be much easier to say the two are separate.

rsc commented 3 years ago

@rogpeppe:

Would the failure include all messages logged up until the timeout?

Yes, I would hope so. I think that's possible independent of this proposal though. Right now we just do a panic but I don't see why we couldn't go around flushing output first. (Maybe it's tricky, I don't know.)

mvdan commented 3 years ago

Zero net benefit, but also zero breakage. It's hard to have the latter without the former.

That's fair - I don't have a better alternative in mind. I was mainly highlighting this because we should make it clear in the release notes: that any go test -timeout users will not see this benefit unless they tweak their flags.

rsc commented 3 years ago

Could you expand on the reason why we can't do better? [than crashing the whole process]

We have no way to forcefully stop a goroutine asynchronously. The reason we have no way is that it is unsafe: the goroutine may be holding locks, etc. There also may be other goroutines that are running as part of the test. There is no safe way to attribute which goroutines need stopping (and then also no way to safe way to stop them).

josharian commented 3 years ago

There is no safe way to attribute which goroutines need stopping (and then also no way to safe way to stop them).

If we had a testing.TB Context method then the tests and benchmarks could do cooperative graceful shutdown. (Examples are a bit trickier without further changes, but are also less likely to be problematic.)

rsc commented 3 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

joe-mann commented 3 years ago

What is the benefit of having SetTimeout(d time.Duration) as well as the -testtimeout flag? Would it not suffice to have the former, but not the latter? There are a couple of issues I see with -testtimeout.

Firstly is a basic usability concern. I would suggest that applying the same per-test timeout to all tests (being run) would be unhelpful a lot of the time. However I can much more readily think of situations where you'd want to run go test ./..., but apply specific timeouts to specific tests by adding calls to SetTimeout. But there are far fewer situations where it would make sense to apply to same per-test timeout to all tests - to do so would require you to select the longest-running test contained within the specified packages and set -testtimeout accordingly. And this requires knowledge of code base that the user who runs go test may not possess.

Secondly is a more nuanced concern about how -testtimeout fits into the existing environment. Running go test ./... -testtimeout=1m may have a significant effect on the outcome of a test - i.e. it may fail when previously it passed. And this is an important detail that is detached from the test itself. Currently, if we want a test to be executed in parallel (with other like-minded tests), we can call Parallel at the top of that test. There is no -parallel flag that causes all tests to run in parallel. Given that a timeout can have similarly outcome-altering effects on a test, would it not be more consistent, and more explicit, for per-test timeouts only be set inside the test itself?

@mvdan's suggestion of a way to scale the value passed to SetTimeout seems to me a good way of making -testtimeout largely redundant, whilst also providing a mechanism to adjust the per-test timeout according to the environment in which the tests are being run:

Maybe we could also think about "testing timeout scales"; so I could run my tests in a way that all per-test timeouts are tripled if I know my machine is significantly slower than an average laptop (e.g. public CI). I think Go's builders do something similar, where very slow computers such as small ARM/RISCV boards get many timeouts multiplied to avoid benign timeouts.

bcmills commented 3 years ago

@josharian, no Context method is needed. Tests and benchmarks can already do cooperative graceful shutdown via the Deadline method — we just need to allow more overhead today because all of the tests end up shutting down at once when the timeout is about to expire.

For an example of how this works today in the cmd/go tests, see: https://github.com/golang/go/blob/19457a58e565ff5b480c3806fe02fbc7ccdf32f0/src/cmd/go/script_test.go#L47-L72

josharian commented 3 years ago

@bcmills t.Deadline lets you predict when your test will time out. It doesn't provide a means to receive the information that some other test has failed (or timed out) and that as a result you should promptly shut down.

bcmills commented 3 years ago

Ah, I see what you mean. But I think that is more-or-less completely orthogonal to having per-test timeouts — that's more of an interaction between t.Parallel() and the existing -test.failfast flag.

cespare commented 3 years ago

@joe-mann to me, most of the value of this proposal comes from -testtimeout.

The way I would use it, it's mostly about improving the handling of deadlocked tests. For example, I interact with a lot of test suites which can take several minutes to run on heavily loaded CI workers but each individual test takes a few seconds at most. It's not like I would find the longest-running test and set -testtimeout to slightly higher than that; instead, I would want -testtimeout to be high enough that no reasonable test could possibly hit it without a bug (whatever that time scale happens to be for the domain). For my test suites, the default value of 1m sounds pretty good.

Then, when a test deadlocks, go test fails faster (up to 9m sooner than it would've otherwise, assuming the default -timeout 10m, or even more if you've set -timeout higher). And for really large test suites, you don't have to go through the dance of (1) start bumping up against the 10m default timeout and get some spurious failures, (2) look into whether they are "natural" timeouts or deadlocks, and (3) raise the -timeout. (That's assuming we delete the 10m default -timeout as I brought up in https://github.com/golang/go/issues/48157#issuecomment-911946245.)

I think SetTimeout is useful for certain tests that are specifically about checking that a particular thing doesn't hang. But I think I would use SetTimeout occasionally whereas -testtimeout would help me every day.

rsc commented 3 years ago

Are there any remaining objections?

mvdan commented 3 years ago

My only remaining worry is what I outline in the last three paragraphs in https://github.com/golang/go/issues/48157#issuecomment-912402979 - that setting absolute time.Duration timeouts for each test feels too rigid. I think built-in support for a "scale" or multiplier would be necessary, e.g. for slower machines such as CI. Otherwise we'll either end up with realistic timeouts that cause sporadic failures on CI, or inflated timeouts that reduce flakes but take too long to make a test fail when locally iterating and testing.

A couple of comments later in the thread seem to agree that this problem warrants some attention. I personally think that some sort of multiplier seems like a sane solution, if you can roughly define what the "base" speed is - such as an average laptop from two years ago that's idle. This solution seems to have been working well enough for build.golang.org, as I understand some builders use a multiplier given slower hardware.

rsc commented 3 years ago

@mvdan There is already an absolute duration in -timeout=, right? It seems like maybe we should have an automatic timeout scale (essentially elevating what cmd/dist does), but that could be done in a separate, orthogonal proposal. And it might not be worth doing until we have more information, since tests can of course check for environment variables and scale on their own, and people can write helpers that do that.

cespare commented 3 years ago

@rsc can you confirm whether this proposal includes removing the 10m default -timeout?

rsc commented 3 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

mvdan commented 3 years ago

@rsc I guess I don't have a problem with people implementing their own scaling, but it would be a shame for each project to reinvent a slightly different wheel with different standards on what the "base timeout scale" is. I am hoping we can at least get ahead of defining what the scale of the default timeout should be. If some projects use "enough even for the slowest machine", and others use "matching a modern fast machine", then adding built-in scaling at a later time won't work with the mismatching baselines.

rsc commented 3 years ago

@mvdan, please feel free to file a proposal for a scaling mechanism. Thanks!

rsc commented 3 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

gopherbot commented 3 years ago

Change https://golang.org/cl/363134 mentions this issue: internal/fuzz: set timeout for each exec of fuzz target

mvdan commented 2 years ago

As a separate point; would https://pkg.go.dev/testing#T.Deadline be updated to take into account the timeout of the running test too? I assume it would be, but just clarifying since this proposal doesn't say.

I wonder if this question got lost in the conversation. I was looking at the API again, and it says:

Deadline reports the time at which the test binary will have exceeded the timeout specified by the -timeout flag.

My intuition is that we should change the docs and behavior to account for per-test timeouts, but it's worth noting that the current docs specify that it only looks at the timeout for the whole test binary.

gopherbot commented 2 years ago

Change https://golang.org/cl/380497 mentions this issue: internal/lsp/regtest: eliminate arbitrary timeouts

gopherbot commented 2 years ago

Change https://go.dev/cl/445597 mentions this issue: internal/testenv: remove RunWithTimout

bitfield commented 1 year ago

It seems as though this has languished a bit. Is any work currently in progress to implement this—if not, are PRs welcome?

bcmills commented 1 year ago

I don't know of anyone working on it at the moment. You are welcome to send a change.

(I recommend the Gerrit workflow rather than a GitHub PR; see https://go.dev/doc/contribute.)

tpaschalis commented 1 year ago

I'm putting something together to send a CL

gopherbot commented 1 year ago

Change https://go.dev/cl/526355 mentions this issue: testing: add individual timer for tests, benchmarks and fuzz targets

bcmills commented 1 year ago

How this should this proposal interact with #56986 (extended backwards compatibility)?

Should there be a GODEBUG setting that changes the default timeout?
Or, should the default timeout depend on the go version declared in the go.work file and/or the main module's go.mod file?

bcmills commented 1 year ago

I think the answer is probably a GODEBUG, since that would allow the default timeout to be bypassed on a per-package basis using a //go:debug comment, rather than pinning an entire module or workspace to an older go version.

(That's probably more user-friendly than expecting users to add a SetTimeout call to every individual test if they're having timeout trouble.)

bcmills commented 1 year ago

Hmm, we also still need to clarify the questions raised in https://github.com/golang/go/issues/48157#issuecomment-1004259734. 🤔

bcmills commented 1 year ago

Another edge case: if -benchtime is close to (or greater than) -testtimeout, how should the two interact?

If -benchtime is longer than -testtimeout, should the timeout be extended so that the benchmark is not guaranteed to fail? (If so, just the default -testtimeout, or an explicit -testtimeout too?)

Or, should the -testtimeout timer be paused while the benchmark timer is running? (And does the answer to that change if the benchmark function calls SetTimeout explicitly?)

Or, should -testtimeout not apply at all to benchmarks? (Then SetTimeout would presumably be omitted from testing.B and testing.TB..?)

bcmills commented 1 year ago

Another interaction: subtests.

[This part was answered already]

Should each subtest by default get its own `-testtimeout` increment, independent of the parent, or should the parent's timeout bound the overall running time of the subtests? If each subtest gets its own `-testtimeout` increment, presumably the timer for the parent test should be paused while a non-parallel subtest is running, and paused again once the parent test function exits (while we wait for parallel subtests to run)?

But: if we pause the parent test's timer, will that also cause its Deadline to drift upward over time? (If so, that would make Deadline a bit misleading for tests with subtests — but if not, it would be much more difficult for a test without subtests to arrange for useful logging before it times out.)

bcmills commented 1 year ago

Another edge case: does SetTimeout(0) or SetTimeout(<negative value>) mean “unlimited”, or “time out immediately”?

(I would prefer “unlimited” to be SetTimeout(math.MaxInt64) to make it less dangerous to use time.Until to manage timeouts.)

bcmills commented 1 year ago

I just realized that the subtest question is actually already answered in the initial proposal. 😅

But I do think we need to resolve that interaction with Deadline. 🤔

tpaschalis commented 1 year ago

Hey there @bcmills thanks for all the comments 🙌 Apologies for the delay here, let me try to respond.

Another edge case: does SetTimeout(0) or SetTimeout() mean “unlimited”, or “time out immediately”?

I agree with the option of calling SetTimeout(t) for t<=0 to mean "time out immediately". If nothing is set, it still defaults to the value of -test.timeout or 1m.

How this should this proposal interact with https://github.com/golang/go/issues/56986 (extended backwards compatibility)?

I was not familiar with the proposal. All in all, I think that the percentage of tests that would take >1m and whose authors would not have set an explicit timeout will be low, but I'm happy to do something to avoid breaking them if you think it makes sense. From these two options, I agree that a GODEBUG flag would be the better one. If you have any CLs in hand I can look for inspiration, it'd be great!

if we pause the parent test's timer, will that also cause its Deadline to drift upward over time

We're talking about the overall testContext's deadline that is reported by the t.Deadline method, right? My thought is no, since it applies to the entire testContext that is common for all tests being run, I think it can still work as the independent timer as it does today and have no interaction with the new timer. If we want to expose a per-test deadline, we should add a new method. What do you think?

Another edge case: if -benchtime is close to (or greater than) -testtimeout, how should the two interact?

I view these as two separate parameters (same as -fuzztime or -count) and I think they should not interact, especially as -benchtime can be of the form of a multiplier (eg. 100x). Also, implicitly extending the per-test timeout based on some external condition sounds weird to me. Isn't guaranteeing a benchmark to fail the better option here?

bcmills commented 1 year ago

I discussed these questions with @rsc, and here is our thinking.

We would prefer to avoid sentinel values, so SetTimeout(0) should cause an immediate timeout. (There is no mechanism to set it back to the default -testtimeout value apart from reading the flag directly, but that's probably ok.)

T.Deadline should reflect the current timeout. The return value from T.Deadline should reflect the remaining per-test timeout when that is sooner than the overall -timeout deadline. That means that the result of Deadline may change after a call to Parallel, Run, or SetTimeout, but that's probably ok: it's just something to be aware of. (Calls to Parallel in particular are usually at the start of the test anyway.)

When running benchmarks, -benchtime should remain independent of -testtimeout. The -testtimeout flag can bound the overall running time of the benchmark, including reruns to reach the requested -benchtime. (Consider -benchtime=100x — it would be strange to try to adjust a timeout vs. a number of iterations.)

cespare commented 1 year ago

If we're answering open questions on this proposal, can I get confirmation that we're going to remove the default -timeout? My previous inquiries got no replies. The CL that's out doesn't look like it changes the default 10m -timeout but I really think it should.

tpaschalis commented 1 year ago

Thanks for the feedback @bcmills, and apologizing for the delays here. I will integrate this into my CL.

As for @cespare's point, IMHO this is a separate setting and they shouldn't interact with one another. This way we both keep backwards compatibility with the overall test timer, and the new setting simply allows for more fine-grained setting. Why do you think that setting should be removed?

cespare commented 1 year ago

I explained that in my first comment, which I linked (and presumably the up-voters of that comment agreed to some degree):

hitting that timeout without hitting a 1m per-test timeout seems like a signal that we have a giant test suite operating correctly, not that we have a hanging/broken test.

To expand slightly, there are two ways to encounter the default 10m timeout:

A test is stuck/deadlocked, and after 10m the test crashes. (The fix is to fix the test.)
The test suite is large and very slow and eventually starts taking >10m and hitting the timeout. (The fix is to raise -timeout and, ideally, speed up the tests.)

The default -timeout of 10m is useful in scenario 1 and not useful in scenario (2). With a default -testtimeout of 1m, scenario 1 is completely taken care of (in a better way -- it fails after 1m instead of 10m), so the only effect of the default -timeout of 10m is to be an annoyance in scenario 2.

this is a separate setting and they shouldn't interact with one another.

On the contrary, these are deeply intertwined and defining the interaction between timeout settings has been a primary topic of the original proposal and subsequent discussions.

rsc commented 1 year ago

Making the default overall test binary timeout "off" sounds fine to me. We just need to make sure the new 1m per-test timeout applies to all the kinds of tests: tests, examples, fuzzing, and benchmarking.

maxammann commented 9 months ago

The current way of panicing when the hard-coded 10s timeout is hit seems not ideal: https://github.com/golang/go/issues/56238#issuecomment-1948138937

At least it seems to confuse.

tpaschalis commented 3 months ago

Hey everyone 👋 I apologize in advance for the huge gap here, life got in the way :)

I'm back and will try to implement feedback on CL 526355. (cc @cespare who had reached out about it).

So if I understand correctly:

We want to add a GODEBUG setting
T.SetTimeout(d) for d<=0 will panic immediately
We want the 1m per-test timeout to apply to tests, examples, benchmarks and fuzz targets
This means removing the current default value of -timeout (10m) and it will instead default to 0 (unset)
We will allow people to set both flags at once if they wish so
Change the Deadline method and docs to work with the new -testtimeout flag

I'll rebase and amend the CL, and try to introduce some more correctness tests. Thanks for the keeping the conversation rolling!

ianlancetaylor commented 3 months ago

It sounds like we want to say something like:

Each test gets a per-test timeout. The timer for a given test only ticks down when the test is running. It does not tick down when the test is blocked in t.Parallel, nor when it is blocked in t.Run running a subtest.
The default per-test case timeout is 1m (one minute).
- If the new -testtimeout flag is specified explicitly, then that sets a different default.
- If the -testtimeout flag is omitted but -timeout is specified explicitly, then that sets the default too. This way, if you have one really long test and use go test -timeout=30m, the per-case timeout doesn't kick in after 1 minute and kill it anyway.
- if GODEBUG=testtimeout=0 is set, there is no per-test case timeout by default. People can still use -testtimeout to set the per-test timeout, overriding the GODEBUG. People can still use -timeout, but the GODEBUG setting means that there will still be no per-test timeout, just a total test timeout.
- The timeout applies to tests, benchmarks, examples, and fuzz tests.
- The (*testing.T).Deadline method will report the deadline from the -timeout or the -testtimeout option, whichever is shorter.
There is a new testing.TB method SetTimeout(d time.Duration) that allows a test to set its own timeout. Calling SetTimeout does not reset the timer. If a test runs for 30 seconds and then calls t.SetTimeout(1*time.Second), it gets killed for having timed out. A timeout set this way is inherited by subtests. (They each have their own timer.) Calling SetTimeout(0) causes the timeout to expire immediately.
When a test timeout happens, the whole process still gets killed. There's nothing we can really do about that. But the failure does say which test function timed out.
The go command will stop passing -test.timeout=10m. (See also (*goTest).buildArgs in cmd/dist/test.go). The default value of 1m for -test.testtimeout may come from the go command or it may be the default in the testing package.

tpaschalis commented 3 months ago

Thanks for the focused summary here! One thing that we didn't explicitly outline is the relationship with -benchtime.

I'm looking towards simplifying things and leaving them separate so if -benchtime > -testtimeout the benchmark can time out.

tpaschalis commented 3 months ago

cc @cespare thanks for the context on that #69181 issue.

How does that impact your view for this proposal? Would you have it so that we still try to make timeouts and -benchtime separate and make so that the per-function timeout doesn't impact benchmarks at all?

golang / go

cmd/go: add per-test timeouts #48157