golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.6k stars 17.61k forks source link

runtime/trace: "preempted" StateTransition sometimes has Stack of single zeroed StackFrame #68090

Open rhysh opened 4 months ago

rhysh commented 4 months ago

Go version

go version devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000 darwin/arm64

Output of go env in your module/workspace:

$ go env -changed

$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhysh/Library/Caches/go-build'
GOENV='/Users/rhysh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhysh/work/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/rhysh/work'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/rhysh/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/pw/d_qmtcrd3vs0890gvmrq8qx80000gn/T/go-build2075620848=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

$ go test net/http -run='^$' -bench='BenchmarkClientServerParallel/64/h1' -benchtime=100ms -trace=/tmp/trace
goos: darwin
goarch: arm64
pkg: net/http
cpu: Apple M1
BenchmarkClientServerParallel/64/h1-8               1748             64048 ns/op           22199 B/op        131 allocs/op
--- BENCH: BenchmarkClientServerParallel/64/h1-8
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59008->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59007->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59010->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59013->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59014->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59015->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59016->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59017->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59018->127.0.0.1:58995: read: connection reset by peer
    serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59020->127.0.0.1:58995: read: connection reset by peer
        ... [output truncated]
PASS
ok      net/http        0.404s
$ go tool trace -d=1 /tmp/trace | grep -B 2 -A 2 '@ 0x0' | head -n 30
M=6163918848 P=0 G=25 StateTransition Time=775708360289024 Resource=Goroutine(25) Reason="preempted" GoID=25 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6163918848 P=0 G=26 StateTransition Time=775708372081984 Resource=Goroutine(26) Reason="preempted" GoID=26 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--
M=6162198528 P=7 G=2881 StateTransition Time=775708382371201 Resource=Goroutine(2881) Reason="preempted" GoID=2881 Running->Runnable
TransitionStack=
         @ 0x0
                :0

Stack=
         @ 0x0
                :0

--

What did you see happen?

Some StateTransition Events include a Stack and StateTransition.Stack that are not equal to NoStack, but which also don't contain a stack from the Event's goroutine. Instead, they yield a single zeroed StackFrame (PC of 0x0, Line of 0, File and Func of "").

I've only seen this on Running->Runnable transitions, with Reason="preempted".

It's also present in go1.22.4.

Here's the sort of stack I'd expect to see from that execution trace's view of goroutines 25, 26, and 2881:

$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=25 StateTransition/,/^M/ p' | head -n 50
[snip]
M=6162198528 P=4 G=25 StateTransition Time=775708359254528 Resource=Goroutine(25) Reason="system goroutine wait" GoID=25 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=26 StateTransition/,/^M/ p' | head -n 50
[snip]
M=8191703744 P=2 G=26 StateTransition Time=775708359184320 Resource=Goroutine(26) Reason="system goroutine wait" GoID=26 Running->Waiting
TransitionStack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

Stack=
        runtime.gopark @ 0x100bdfdb7
                /usr/local/go/src/runtime/proc.go:424
        runtime.gcBgMarkWorker @ 0x100b8808b
                /usr/local/go/src/runtime/mgc.go:1363

[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=2881 StateTransition/,/^M/ p' | head -n 50
M=6163345408 P=4 G=2881 StateTransition Time=775708380563968 Resource=Goroutine(2881) Reason="sync" GoID=2881 Running->Waiting
TransitionStack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

Stack=
        sync.(*Mutex).Lock @ 0x100bf30ff
                /usr/local/go/src/sync/mutex.go:92
        sync.(*Pool).pinSlow @ 0x100bf3094
                /usr/local/go/src/sync/pool.go:227
        sync.(*Pool).pin @ 0x100bf301b
                /usr/local/go/src/sync/pool.go:220
        sync.(*Pool).Get @ 0x100bf2d6f
                /usr/local/go/src/sync/pool.go:135
        fmt.newPrinter @ 0x100c4dff3
                /usr/local/go/src/fmt/print.go:152
        fmt.Fprintf @ 0x100c4e467
                /usr/local/go/src/fmt/print.go:223
        net/http.(*Request).write @ 0x100e4aacb
                /usr/local/go/src/net/http/request.go:680
        net/http.(*persistConn).writeLoop @ 0x100e73df7
                /usr/local/go/src/net/http/transport.go:2522

[snip]

What did you expect to see?

I expected the stack to be trace.NoStack when no stack was available, or for the stack to contain PC/Func/File/Line corresponding to code that the goroutine had on its stack. I should not see PC of 0x0.

CC @mknyszek @golang/runtime

gabyhelp commented 4 months ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mknyszek commented 4 months ago

I have a suspicion as to how this is happening, but not a complete picture yet.

The problematic transition you point out also happens from a thread stack. The two cases where such a transition may appear are gopreempt_m and goyield_m. The former is only called from newstack while the latter is only called from goyield which is on the sema path (for a direct handoff).

I suspect that in one of these paths, gp.sched is still empty somehow, though I'm not sure how that's possible. That's at least a case where traceStack may produce a length-1 buffer with a single zero PC in it.

mknyszek commented 4 months ago

68093 may be related, but I suspect not.

mknyszek commented 4 months ago

Here's a simple reproducer:

package main

import (
    "log"
    "os"
    "runtime"
    "runtime/trace"
)

func main() {
    f, err := os.Create("trace.out")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    trace.Start(f)

    go func() {
        for {
            // Non-stop preemption points.
            g()
        }
    }()

    runtime.GC()
    trace.Stop()
}

//go:noinline
func g() {

}

It disproves my theory about gp.sched being empty. Also, funnily enough, the goroutine created in main always shows up fine -- it's the GC mark worker that's the problem.

mknyszek commented 4 months ago

OK, I figured it out. It's that the stack trace has exactly 1 frame in it, but the skip count is also 1.

In the reproducer, the victim goroutine is the GC mark worker (just like in the original post) and when I lower the skip count from 1 to 0, I see:

M=683710 P=2 G=68 StateTransition Time=253871909721536 Resource=Goroutine(68) Reason="preempted" GoID=68 Running->Runnable
TransitionStack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Stack=
        runtime.gcMarkDone @ 0x416865
                /usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824

Unfortunately, there's still the question as to why the bottom frame in the mark worker isn't showing up.