golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.04k stars 17.68k forks source link

runtime: Go 1.14 is extremely slow under debugging on macOS #37528

Closed Cyberax closed 3 years ago

Cyberax commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/cyberax/Library/Caches/go-build"
GOENV="/Users/cyberax/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/cyberax/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.14/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.14/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/cyberax/aurora/terra/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ph/3916sjqd66z3tl55r2d5l79h0000gn/T/go-build235180942=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

After upgrade to 1.14 our application became unusable in a debugger with asynchronous preemption turned on. It's sometimes 1000 times slower than with the asyncpreempt=off.

You can try this program: https://play.golang.org/p/k3fnAt18hcO Normally it runs in under <4ms on my computer but under a debugger it runs in 500-2000ms.

What did you expect to see?

Normal speed.

What did you see instead?

Elapsed 1433 ms

dmitshur commented 4 years ago

/cc @aclements @mknyszek

cherrymui commented 4 years ago

What debugger are you using? LLDB?

On my mac, it is clearly slower under lldb but not that slow. I see 10X-30X slowdown.

% ./xxx
Elapsed 4 ms

% lldb ./xxx
(lldb) target create "./xxx"
Current executable set to './xxx' (x86_64).
(lldb) run
Process 42895 launched: './xxx' (x86_64)
Elapsed 135 ms

Process 42895 exited with status = 0 (0x00000000) 
(lldb) run
Process 42899 launched: './xxx' (x86_64)
Elapsed 38 ms

Process 42899 exited with status = 0 (0x00000000) 
(lldb) run
Process 42903 launched: './xxx' (x86_64)
Elapsed 36 ms

Process 42903 exited with status = 0 (0x00000000) 

GDB on Linux doesn't show any slowdown.

Maybe LLDB's signal handling on darwin is not very efficient? It would be interesting to see what the overhead of GDB is on darwin. @dr2chase

dr2chase commented 4 years ago

If anyone can provide detailed actually-working instructions (a shell script would be ideal, "frab the gnobulator"-style instructions are less helpful) on how to get gdb working on Darwin, then I could perhaps answer that question. Till then, I can report that "no slowdown was observed" (because there are no observations at all).

Cyberax commented 4 years ago

I'm using the Delve debugger with default options:

cyberax@CybMac:~/aurora/terra/repro$ go get -u github.com/go-delve/delve/cmd/dlv
...
cyberax@CybMac:~/aurora/terra/repro$ go run github.com/go-delve/delve/cmd/dlv debug main.go
Type 'help' for list of commands.
(dlv) continue
Elapsed 991 ms
Process 72508 has exited with status 0
Cyberax commented 4 years ago

Here's a reproduction script: https://gist.github.com/Cyberax/693f1d1f2c4550478615047e8ca866e0

Output:

cyberax@CybMac:~/aurora/bug$ ./repro.sh
Elapsed 4 ms
Type 'help' for list of commands.
Elapsed 936 ms
Process 3277 has exited with status 0
exit

With preempt off:

cyberax@CybMac:~/aurora/bug$ export GODEBUG=asyncpreemptoff=1
cyberax@CybMac:~/aurora/bug$ ./repro.sh
Elapsed 4 ms
Type 'help' for list of commands.
Elapsed 9 ms
Process 5574 has exited with status 0
exit
dr2chase commented 4 years ago

It has something to do with asynchronous preemption, not sure what. What @derekparker just now said.

Cyberax commented 4 years ago

I think this might have something to do with somewhat unusual stack arrangement. The reproduction case creates an array of slices into a large array. As far as I remember, asyncpreempt treats stack conservatively during GC pauses which would cause a lot of unnecessary work

I've tested this hypothesis by running with GOGC=off and it indeed fixes the issue:

cyberax@CybMac:~/aurora/bug$ ./repro.sh
Elapsed 4 ms
Type 'help' for list of commands.
Elapsed 7 ms
Process 7323 has exited with status 0
exit
cherrymui commented 4 years ago

I don't think conservative stack scan causes this. It is the same amount of work, under the debugger or not.

My guess is that with GC off, there are fewer preemption happened, as a source of preemptions is the GC trying to stop the goroutine to scan the stack or stop the world.

Cyberax commented 3 years ago

The workaround with disabling the asyncpreemt works, closing the issue.