golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.83k stars 17.65k forks source link

runtime/pprof: CPU profile writer falls behind in big program #7949

Closed btracey closed 9 years ago

btracey commented 10 years ago
Running pprof on a program that has O(10^6) cpu-bound goroutines loses ~50% of samples.

Code is github.com/btracey/stackmc/examples/paper/rosen_unif

Svg of pprof output: http://stanford.edu/~btracey/gobench/stackmc/graph.svg

Working now on a shorter reproducer

[btracey@zion ~]$ go env
GOARCH="amd64"
GOBIN=""
GOCHAR="6"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/ADL/btracey/mygo"
GORACE=""
GOROOT="/ADL/btracey/gover/go_tip/go"
GOTOOLDIR="/ADL/btracey/gover/go_tip/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
btracey commented 10 years ago

Comment 1:

Here is a shorter reproducing code. This takes a couple minutes on my machine, but
changing the inner loop computation to size 1000 rather than 100000 did not reproduce
the problem
http://play.golang.org/p/O-ixYe9BdY
ianlancetaylor commented 10 years ago

Comment 2:

I'm pretty sure this was fixed earlier today by https://golang.org/cl/96060043
.  Closing this bug assuming that I am correct;  please re-open if I am not.

Status changed to Fixed.

btracey commented 10 years ago

Comment 3:

I don't think it was fixed by that revision.
[btracey@zion ~]$ go version
go version devel +ad0ee2227396 Wed May 07 18:49:13 2014 +0400 linux/amd64
I downloaded the source after that revision was committed, though I don't know if there
is a lag, and I'm not sure how to check that that CL is included in my version.
ianlancetaylor commented 10 years ago

Comment 4:

You're right, you do have that change (use hg log to see the changes you have).

Labels changed: added repo-main, release-go1.3maybe.

Status changed to New.

btracey commented 10 years ago

Comment 5:

From Dmitry:
The profile is written by a normal goroutine, so because you lots of
runnable goroutine, the profiler goroutine is executed very
episodically.
rsc commented 10 years ago

Comment 6:

I don't expect to change this for Go 1.3 at this point. I'm not really sure what we
should do short of giving extra priority to the writer goroutine (but even that is not
clear).

Labels changed: added release-go1.4, removed release-go1.3maybe.

Status changed to Accepted.

btracey commented 10 years ago

Comment 7:

What about not counting LostProfileData samples, and instead just report a smaller
number of total samples and renormalize the percentages (or having a pprof option to do
so)?
rsc commented 10 years ago

Comment 8:

I don't see any good way to fix this. Unless it comes up more often I'm just going to
live with it.
If you want to just ignore the lost profile data, you can give pprof the option
--ignore=LostProfileData.

Status changed to Unfortunate.