golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.66k stars 17.62k forks source link

runtime/pprof: block/mutex profile stacks can sometimes miss leaf frames #69747

Open nsrip-dd opened 2 weeks ago

nsrip-dd commented 2 weeks ago

Go version

go version go1.23.1 darwin/arm64

Output of go env in your module/workspace:

n/a

What did you do?

Built this program with PGO and ran it:

package main

import (
    "os"
    "runtime"
    "runtime/pprof"
)

func main() {
    runtime.SetBlockProfileRate(1)
    ch := make(chan struct{})
    go func() {
        close(ch)
    }()
    <-ch
    pprof.Lookup("block").WriteTo(os.Stdout, 1)
}

What did you see happen?

If I build with PGO using a profile where runtime.blockevent is hot, then I see this output:

--- contention:
cycles/second=999458200
97416 1 @ 0x4cd357 0x438c4b 0x4709a1
#   0x4cd356    main.main+0x76      /home/junk/simple.go:17
#   0x438c4a    runtime.main+0x28a  /usr/local/go/src/runtime/proc.go:272
Example program which generates such a CPU profile: ```go package main import ( "os" "runtime" "runtime/pprof" "time" _ "unsafe" ) //go:linkname blockevent runtime.blockevent func blockevent(cycles int64, skip int) func main() { runtime.SetBlockProfileRate(1) pprof.StartCPUProfile(os.Stdout) defer pprof.StopCPUProfile() now := time.Now() for time.Since(now) < 5*time.Second { for i := 0; i < 100000; i++ { blockevent(1, 1) } } } ```

What did you expect to see?

I expected to see chanrecv1 in the root frame:

--- contention:
cycles/second=999008970
54083 1 @ 0x406312 0x4cd357 0x438c4b 0x4709a1
#   0x406311    runtime.chanrecv1+0x11  /usr/local/go/src/runtime/chan.go:489
#   0x4cd356    main.main+0x76      /home/junk/simple.go:17
#   0x438c4a    runtime.main+0x28a  /usr/local/go/src/runtime/proc.go:272

I see that if I run the program with frame pointer unwinding disabled (GODEBUG=tracefpunwindoff=1) or if I run it with Go 1.22.

The problem is that if runtime.saveblockevent is inlined into runtime.blockevent, then getfp() will return one of runtime.saveblockevent's callers frame pointers here, and runtime.fpTracebackPartialExpand will start frame pointer unwinding from the wrong frame, making the skip values incorrect. Depending on how much inlining there is, we might see even more missing frames.

gabyhelp commented 2 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

gabyhelp commented 2 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mknyszek commented 2 weeks ago

In triage, we don't think it's super important to show chanrecv1 here (the line in the user code should already point to the channel operation), but we understand the general concern about maintaining parity with older versions.