golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.35k stars 17.71k forks source link

testing: timeouts prevent deadlock detection from working #69188

Open myaaaaaaaaa opened 3 months ago

myaaaaaaaaa commented 3 months ago

Go version

go version go1.23.0 linux/amd64

Output of go env in your module/workspace:

Click to expand ```shell GO111MODULE='' GOARCH='amd64' GOBIN='' GOCACHE='~/.cache/go-build' GOENV='~/.config/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='linux' GOINSECURE='' GOMODCACHE='~/go/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='linux' GOPATH='~/go' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/lib/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64' GOVCS='' GOVERSION='go1.23.0' GODEBUG='' GOTELEMETRY='on' GOTELEMETRYDIR='~/.config/go/telemetry' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='clang' CXX='clang++' CGO_ENABLED='0' GOMOD='/dev/null' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3043789877=/tmp/go-build -gno-record-gcc-switches' ```

What did you do?

package main

func main() {
    select {}
}

package main_test

import "testing"

func TestDeadlock(t *testing.T) {
    select {}
}

What did you see happen?

$ go run main.go
fatal error: all goroutines are asleep - deadlock!
...

$ go test -timeout 0
fatal error: all goroutines are asleep - deadlock!
...

$ go test -timeout 1s    # note that the default is 10m
panic: test timed out after 1s
    running tests:
        TestDeadlock (1s)
...

What did you expect to see?

 $ go run main.go
 fatal error: all goroutines are asleep - deadlock!
 ...

 $ go test -timeout 0
 fatal error: all goroutines are asleep - deadlock!
 ...

 $ go test -timeout 1s    # note that the default is 10m
-panic: test timed out after 1s
-   running tests:
-       TestDeadlock (1s)
+fatal error: all goroutines are asleep - deadlock!
 ...
gabyhelp commented 3 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

ianlancetaylor commented 3 months ago

If there is an active timer, the program is not deadlocked. It will keep running when the timer expires.

To fix this we would need to somehow add a special case for the testing timeout. I don't think it's worth it. There are already many other cases where the deadlock detector does not reliably detect deadlocks.

myaaaaaaaaa commented 3 months ago

Would it suffice to simply keep a count of goroutines excluded from deadlock analysis? Something like the following pseudocode:

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index c4f175b..5304612 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -5902,10 +5902,12 @@ func incidlelocked(v int32) {
        checkdead()
    }
    unlock(&sched.lock)
 }

+var excludedGoroutines atomic.Int32
+
 // Check for deadlock situation.
 // The check is based on number of running M's, if 0 -> deadlock.
 // sched.lock must be held.
 func checkdead() {
    assertLockHeld(&sched.lock)
@@ -5931,10 +5933,11 @@ func checkdead() {
    // for details.)
    var run0 int32
    if !iscgo && cgoHasExtraM && extraMLength.Load() > 0 {
        run0 = 1
    }
+   run0 += excludedGoroutines.Load()

    run := mcount() - sched.nmidle - sched.nmidlelocked - sched.nmsys
    if run > run0 {
        return
    }
diff --git a/src/testing/testing.go b/src/testing/testing.go
index 526cba3..d6731e2 100644
--- a/src/testing/testing.go
+++ b/src/testing/testing.go
@@ -2370,10 +2370,11 @@ func (m *M) startAlarm() time.Time {
            }
            extra = b.String()
        }
        panic(fmt.Sprintf("test timed out after %v%s", *timeout, extra))
    })
+   excludedGoroutines.Add(1)
    return deadline
 }

 // runningList returns the list of running tests.
 func runningList() []string {
@@ -2387,10 +2388,11 @@ func runningList() []string {
 }

 // stopAlarm turns off the alarm.
 func (m *M) stopAlarm() {
    if *timeout > 0 {
+       excludedGoroutines.Add(-1)
        m.timer.Stop()
    }
 }

 func parseCpuList() {

Relevant code blocks:

https://github.com/golang/go/blob/6885bad7dd86880be6929c02085e5c7a67ff2887/src/runtime/proc.go#L5928-L5940 https://github.com/golang/go/blob/6885bad7dd86880be6929c02085e5c7a67ff2887/src/testing/testing.go#L2352-L2394

ianlancetaylor commented 2 months ago

It doesn't really have anything to do with goroutines. A timer does not imply a goroutine. A timer just mean that at some point in the future a value will appear on a channel or a function will be run. That might in turn unblock some other goroutine.