Open gopherbot opened 3 days ago
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "runtime/pprof" && test == "TestProfilerStackDepth/heap"
I don't think this is darwin/arm64
specific? This is failing because of the elided runtime
frames at the end of the stack. If I dump the runtime.MemProfile
stacks on failure, which don't trim runtime frames, we sometimes can get a stack like this:
runtime.acquireSudog /Users/nick.ripley/repos/go/src/runtime/proc.go:484
runtime.semacquire1 /Users/nick.ripley/repos/go/src/runtime/sema.go:149
runtime.semacquire /Users/nick.ripley/repos/go/src/runtime/sema.go:129
runtime.gcMarkDone /Users/nick.ripley/repos/go/src/runtime/mgc.go:827
runtime.gcAssistAlloc /Users/nick.ripley/repos/go/src/runtime/mgcmark.go:552
runtime.deductAssistCredit /Users/nick.ripley/repos/go/src/runtime/malloc.go:1679
runtime.mallocgc /Users/nick.ripley/repos/go/src/runtime/malloc.go:1044
runtime.convTslice /Users/nick.ripley/repos/go/src/runtime/iface.go:443
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2601
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
runtime/pprof.allocDeep /Users/nick.ripley/repos/go/src/runtime/pprof/pprof_test.go:2598
... etc ...
Those runtime.*
frames count against the stack depth limit when recording the stack, but get removed from the pprof output. We can probably change the test to check to account for this. Right now it only checks the first stack with allocDeep
frames, which might be this one. We could check all the allocDeep
stacks, since at least one should correspond to the large allocation the test is really looking for.
Change https://go.dev/cl/623998 mentions this issue: runtime/pprof: relax TestProfilerStackDepth
Woah, I didn't realize mallocgc
was allowed to allocate inside of itself. But upon closer inspection it turns out that's indeed possible until mp.mallocing
is set, which makes sense. Thanks for figuring this out and submitting a fix. I reviewed the CL - LGTM 🙇.
Also: I think we've been bitten by the elided runtime
frames several times now when it comes to writing reliable tests for profiling. This alone may not be sufficient reason to get rid of them, but I also feel that in situations like these they actually hide important allocation information from the user. At least personally, I'd very much like to know if something is allocating in the runtime. Especially if it explains unexpected allocation sizes being report in pprof. The way this is currently presented to the user is very confusing IMO.
Issue created automatically to collect these failures.
Example (log):
— watchflakes