golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.85k stars 17.65k forks source link

runtime/pprof: TestProfilerStackDepth/heap failures #70112

Open gopherbot opened 3 days ago

gopherbot commented 3 days ago
#!watchflakes
default <- pkg == "runtime/pprof" && test == "TestProfilerStackDepth/heap"

Issue created automatically to collect these failures.

Example (log):

=== RUN   TestProfilerStackDepth/heap
    pprof_test.go:2545: Profile = PeriodType: space bytes
        Period: 1
        Time: 2024-10-29 09:29:44.147981 -0700 PDT
        Samples:
        alloc_objects/count alloc_space/bytes inuse_objects/count inuse_space/bytes
                  0          0          0          0: 1 2 3 4 5 
                  0          0          0          0: 6 5 
                  0          0          0          0: 7 8 5 
                  0          0          0          0: 9 10 8 5 
...
           763: 0x100af4087 M=1 runtime.mstart1 /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:1852 s=1818
           764: 0x100af3fb3 M=1 runtime.mstart0 /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:1802 s=1776
           765: 0x100b2bdff M=1 runtime.mstart /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/asm_arm64.s:166 s=165
           766: 0x100afd49f M=1 runtime.procresize /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:5752 s=5694
           767: 0x100af2067 M=1 runtime.schedinit /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:873 s=793
           768: 0x100b2bd8b M=1 runtime.rt0_go /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/asm_arm64.s:106 s=16
        Mappings
        1: 0x100ab8000/0x100c7c000/0x0 /Volumes/Work/s/w/ir/x/t/go-build3551365136/b1697/pprof.test  [FN]
    pprof_test.go:2556: want stack depth = 128, got 121
--- FAIL: TestProfilerStackDepth/heap (0.14s)

watchflakes

gopherbot commented 3 days ago

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "runtime/pprof" && test == "TestProfilerStackDepth/heap"
2024-10-29 16:24 gotip-darwin-arm64_13 go@f5526b56 runtime/pprof.TestProfilerStackDepth/heap (log) === RUN TestProfilerStackDepth/heap pprof_test.go:2545: Profile = PeriodType: space bytes Period: 1 Time: 2024-10-29 09:29:44.147981 -0700 PDT Samples: alloc_objects/count alloc_space/bytes inuse_objects/count inuse_space/bytes 0 0 0 0: 1 2 3 4 5 0 0 0 0: 6 5 0 0 0 0: 7 8 5 0 0 0 0: 9 10 8 5 ... 763: 0x100af4087 M=1 runtime.mstart1 /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:1852 s=1818 764: 0x100af3fb3 M=1 runtime.mstart0 /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:1802 s=1776 765: 0x100b2bdff M=1 runtime.mstart /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/asm_arm64.s:166 s=165 766: 0x100afd49f M=1 runtime.procresize /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:5752 s=5694 767: 0x100af2067 M=1 runtime.schedinit /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/proc.go:873 s=793 768: 0x100b2bd8b M=1 runtime.rt0_go /Volumes/Work/s/w/ir/x/w/goroot/src/runtime/asm_arm64.s:106 s=16 Mappings 1: 0x100ab8000/0x100c7c000/0x0 /Volumes/Work/s/w/ir/x/t/go-build3551365136/b1697/pprof.test [FN] pprof_test.go:2556: want stack depth = 128, got 121 --- FAIL: TestProfilerStackDepth/heap (0.14s)

watchflakes

nsrip-dd commented 20 hours ago

I don't think this is darwin/arm64 specific? This is failing because of the elided runtime frames at the end of the stack. If I dump the runtime.MemProfile stacks on failure, which don't trim runtime frames, we sometimes can get a stack like this:

runtime.acquireSudog /Users/nick.ripley/repos/go/src/runtime/proc.go:484
runtime.semacquire1 /Users/nick.ripley/repos/go/src/runtime/sema.go:149
runtime.semacquire /Users/nick.ripley/repos/go/src/runtime/sema.go:129
runtime.gcMarkDone /Users/nick.ripley/repos/go/src/runtime/mgc.go:827
runtime.gcAssistAlloc /Users/nick.ripley/repos/go/src/runtime/mgcmark.go:552
runtime.deductAssistCredit /Users/nick.ripley/repos/go/src/runtime/malloc.go:1679
runtime.mallocgc /Users/nick.ripley/repos/go/src/runtime/malloc.go:1044
runtime.convTslice /Users/nick.ripley/repos/go/src/runtime/iface.go:443
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2601
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
... etc ... 

Those runtime.* frames count against the stack depth limit when recording the stack, but get removed from the pprof output. We can probably change the test to check to account for this. Right now it only checks the first stack with allocDeep frames, which might be this one. We could check all the allocDeep stacks, since at least one should correspond to the large allocation the test is really looking for.

gopherbot commented 19 hours ago

Change https://go.dev/cl/623998 mentions this issue: runtime/pprof: relax TestProfilerStackDepth

felixge commented 51 minutes ago

Woah, I didn't realize mallocgc was allowed to allocate inside of itself. But upon closer inspection it turns out that's indeed possible until mp.mallocing is set, which makes sense. Thanks for figuring this out and submitting a fix. I reviewed the CL - LGTM 🙇.

Also: I think we've been bitten by the elided runtime frames several times now when it comes to writing reliable tests for profiling. This alone may not be sufficient reason to get rid of them, but I also feel that in situations like these they actually hide important allocation information from the user. At least personally, I'd very much like to know if something is allocating in the runtime. Especially if it explains unexpected allocation sizes being report in pprof. The way this is currently presented to the user is very confusing IMO.