golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

runtime/cgo: hang in `pthread_cond_wait` during GC start-the-world on macOS #45765

Closed anqurvanillapy closed 1 year ago

anqurvanillapy commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.16.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/xxx/Library/Caches/go-build"
GOENV="/Users/xxx/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/xxx/workspace/go/pkg/mod"
GONOPROXY="git.garena.com"
GONOSUMDB="git.garena.com"
GOOS="darwin"
GOPATH="/Users/xxx/workspace/go"
GOPRIVATE="git.garena.com"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/6f/z3n4j3hj1b90vrd9c1m4m4jh0000gy/T/go-build2766702383=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I wrote an nginx module that receives a JSON request and handles it with Cgo functions. In the Cgo handler, I also used an ANTLR4-generated parser to parse some strings. The grammar is a bit complicated and it's doing a lot of recursive works.

It's weird that the parser could always give the results for the first time (receiving the HTTP response), but for the second time, it hangs at the mallocgc forever.

Below is the partial backtrace from lldb.

It seems like mallocgc triggers a sleep with a negative duration so it calls the pthread_cond_wait in function semasleep, and it hangs.

What did you expect to see?

My program could safely pass through the mallocgc 😢

What did you see instead?

Partial backtrace via lldb:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff67326882 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff673e7425 libsystem_pthread.dylib`_pthread_cond_wait + 698
    frame #2: 0x00000001068c42f0 ngx_http_xxx_module.so`runtime.pthread_cond_wait_trampoline + 16
    frame #3: 0x00000001068c2250 ngx_http_xxx_module.so`runtime.asmcgocall + 112
    frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
    frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
    frame #6: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
    frame #7: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #8: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #9: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #10: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #11: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #12: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
    frame #13: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #14: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #15: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #16: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #17: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #18: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #19: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #20: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #21: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #22: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #23: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #24: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #25: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #26: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #27: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #28: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #29: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #30: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #31: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #32: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
    frame #33: 0x00000001068a5fa9 ngx_http_xxx_module.so`runtime.growslice + 489
    frame #34: 0x0000000106935216 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add + 886
    frame #35: 0x0000000106935216 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add + 886
    frame #36: 0x0000000106957f62 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 1634
    frame #37: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #38: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #39: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #40: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #41: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #42: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #43: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #44: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #45: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #46: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #47: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #48: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #49: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #50: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #51: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #52: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #53: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #54: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #55: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #56: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #57: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #58: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #59: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #60: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #61: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #62: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #63: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #64: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #65: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #66: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #67: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #68: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #69: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #70: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #71: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #72: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #73: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #74: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #75: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #76: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #77: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #78: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #79: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #80: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #81: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #82: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #83: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #84: 0x0000000106957ccc ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureWork + 972
    frame #85: 0x0000000106957556 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).closureCheckingStopState + 1750
    frame #86: 0x0000000106955270 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).computeStartState + 464
    frame #87: 0x00000001069513a9 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*ParserATNSimulator).AdaptivePredict + 2377
    frame #88: 0x00000001069959ae ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).bitExpr + 814
    frame #89: 0x0000000106994685 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).Predicate + 453
    frame #90: 0x0000000106993568 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).boolPrim + 584
    frame #91: 0x0000000106991fcd ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).expr + 3373
    frame #92: 0x000000010698b845 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).WhereClause + 485
    frame #93: 0x00000001069805fe ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).QuerySpec + 1150
    frame #94: 0x000000010697f656 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).QueryExprBody + 598
    frame #95: 0x000000010697e413 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).QueryExpr + 435
    frame #96: 0x000000010697dc9c ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).SelectStmt + 796
    frame #97: 0x000000010697d3af ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).Stmt + 943
    frame #98: 0x000000010697c7b0 ngx_http_xxx_module.so`github.com/anqurvanillapy/xxx-module/generated/grammar.(*NgxQLParser).Query + 688

The process sample:

Sampling process 96294 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling nginx (pid 96294) every 1 millisecond
Process:         nginx [96294]
Path:            /Users/USER/*/nginx
Load Address:    0x1066c3000
Identifier:      nginx
Version:         0
Code Type:       X86-64
Parent Process:  nginx [96290]

Date/Time:       2021-04-26 11:48:24.660 +0800
Launch Time:     2021-04-26 11:28:40.561 +0800
OS Version:      Mac OS X 10.15.7 (19H114)
Report Version:  7
Analysis Tool:   /usr/bin/sample

Physical footprint:         5892K
Physical footprint (peak):  5892K
----

Call graph:
    2969 Thread_5240835   DispatchQueue_1: com.apple.main-thread  (serial)
      2969 runtime.asmcgocall  (in ngx_http_xxx_module.so) + 112  [0x1068c2250]  asm_amd64.s:667
        2969 runtime.pthread_cond_wait_trampoline  (in ngx_http_xxx_module.so) + 16  [0x1068c42f0]  sys_darwin_amd64.s:564
          2969 _pthread_cond_wait  (in libsystem_pthread.dylib) + 698  [0x7fff673e7425]
            2969 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff67326882]

Total number in stack (recursive counted multiple, when >=5):

Sort by top of stack, same collapsed (when >= 5):
        __psynch_cvwait  (in libsystem_kernel.dylib)        2969

Binary Images:
       0x1066c3000 -        0x1067baff7 +nginx (0) <2797DEF5-7496-38EC-9D7A-DF9EA84E713E> /Users/*/nginx
       0x106857000 -        0x106a1afff +ngx_http_xxx_module.so (0) <BCE2098B-4125-3FFC-AF3F-F074E2D9AE47> /Users/*/ngx_http_xxx_module.so
       0x113518000 -        0x1135a9f47  dyld (750.6) <F58DDECD-315C-3B7C-8162-A5E8E1D62FE3> /usr/lib/dyld
    0x7fff641c9000 -     0x7fff641cafff  libSystem.B.dylib (1281.100.1) <8E6AD412-91E7-36FC-A6FD-C13B06A4952A> /usr/lib/libSystem.B.dylib
    0x7fff644af000 -     0x7fff64501fff  libc++.1.dylib (902.1) <59A8239F-C28A-3B59-B8FA-11340DC85EDC> /usr/lib/libc++.1.dylib
    0x7fff64502000 -     0x7fff64517ffb  libc++abi.dylib (902) <E692F14F-C65E-303B-9921-BB7E97D77855> /usr/lib/libc++abi.dylib
    0x7fff66027000 -     0x7fff6605afde  libobjc.A.dylib (787.1) <6DF81160-5E7F-3E31-AA1E-C875E3B98AF6> /usr/lib/libobjc.A.dylib
    0x7fff66703000 -     0x7fff66715ff3  libz.1.dylib (76) <793D9643-CD83-3AAC-8B96-88D548FAB620> /usr/lib/libz.1.dylib
    0x7fff66fc4000 -     0x7fff66fc9ff3  libcache.dylib (83) <AF488D13-9E89-35E0-B078-BE37CC5B8586> /usr/lib/system/libcache.dylib
    0x7fff66fca000 -     0x7fff66fd5fff  libcommonCrypto.dylib (60165.120.1) <C7912BE5-993E-3581-B2A0-6AABDC8C5562> /usr/lib/system/libcommonCrypto.dylib
    0x7fff66fd6000 -     0x7fff66fddfff  libcompiler_rt.dylib (101.2) <49B8F644-5705-3F16-BBE0-6FFF9B17C36E> /usr/lib/system/libcompiler_rt.dylib
    0x7fff66fde000 -     0x7fff66fe7ff7  libcopyfile.dylib (166.40.1) <3C481225-21E7-370A-A30E-0CCFDD64A92C> /usr/lib/system/libcopyfile.dylib
    0x7fff66fe8000 -     0x7fff6707afdb  libcorecrypto.dylib (866.140.1) <60567BF8-80FA-359A-B2F3-A3BAEFB288FD> /usr/lib/system/libcorecrypto.dylib
    0x7fff67187000 -     0x7fff671c7ff0  libdispatch.dylib (1173.100.2) <CD9C059C-91D9-30E8-8926-5B9CD0D5D4F5> /usr/lib/system/libdispatch.dylib
    0x7fff671c8000 -     0x7fff671fefff  libdyld.dylib (750.6) <789A18C2-8AC7-3C88-813D-CD674376585D> /usr/lib/system/libdyld.dylib
    0x7fff671ff000 -     0x7fff671ffffb  libkeymgr.dylib (30) <DB3337BE-01CA-3425-BD0C-87774FC0CDC0> /usr/lib/system/libkeymgr.dylib
    0x7fff6720d000 -     0x7fff6720dff7  liblaunch.dylib (1738.140.2) <7200E214-9B4D-3B22-9844-4C7892FC890B> /usr/lib/system/liblaunch.dylib
    0x7fff6720e000 -     0x7fff67213ff7  libmacho.dylib (959.0.1) <AA613A9C-961A-3B67-B696-4622FA59FC4E> /usr/lib/system/libmacho.dylib
    0x7fff67214000 -     0x7fff67216ff3  libquarantine.dylib (110.40.3) <F234E51D-FD0B-3EE4-B679-AE3EE9C536C3> /usr/lib/system/libquarantine.dylib
    0x7fff67217000 -     0x7fff67218ff7  libremovefile.dylib (48) <7C7EFC79-BD24-33EF-B073-06AED234593E> /usr/lib/system/libremovefile.dylib
    0x7fff67219000 -     0x7fff67230ff3  libsystem_asl.dylib (377.60.2) <1563EE02-0657-3B78-99BE-A947C24122EF> /usr/lib/system/libsystem_asl.dylib
    0x7fff67231000 -     0x7fff67231ff7  libsystem_blocks.dylib (74) <0D53847E-AF5F-3ACF-B51F-A15DEA4DEC58> /usr/lib/system/libsystem_blocks.dylib
    0x7fff67232000 -     0x7fff672b9fff  libsystem_c.dylib (1353.100.2) <BBDED5E6-A646-3EED-B33A-91E4331EA063> /usr/lib/system/libsystem_c.dylib
    0x7fff672ba000 -     0x7fff672bdffb  libsystem_configuration.dylib (1061.141.1) <0EE84C33-64FD-372B-974A-AF7A136F2068> /usr/lib/system/libsystem_configuration.dylib
    0x7fff672be000 -     0x7fff672c1fff  libsystem_coreservices.dylib (114) <A199156E-058D-3ABB-BCE9-4B9F20DCED0F> /usr/lib/system/libsystem_coreservices.dylib
    0x7fff672c2000 -     0x7fff672cafff  libsystem_darwin.dylib (1353.100.2) <5B12B5DB-3F30-37C1-8ECC-49A66B1F2864> /usr/lib/system/libsystem_darwin.dylib
    0x7fff672cb000 -     0x7fff672d2fff  libsystem_dnssd.dylib (1096.100.3) <EBB4C2C2-E031-3094-B40A-E67BF261D295> /usr/lib/system/libsystem_dnssd.dylib
    0x7fff672d3000 -     0x7fff672d4ffb  libsystem_featureflags.dylib (17) <29FD922A-EC2C-3F25-BCCC-B58D716E60EC> /usr/lib/system/libsystem_featureflags.dylib
    0x7fff672d5000 -     0x7fff67322ff7  libsystem_info.dylib (538) <8A321605-5480-330B-AF9E-64E65DE61747> /usr/lib/system/libsystem_info.dylib
    0x7fff67323000 -     0x7fff6734fff7  libsystem_kernel.dylib (6153.141.10) <FF092EE8-5BEE-3B9A-8749-F0A067115C7E> /usr/lib/system/libsystem_kernel.dylib
    0x7fff67350000 -     0x7fff67397fff  libsystem_m.dylib (3178) <00F331F1-0D09-39B3-8736-1FE90E64E903> /usr/lib/system/libsystem_m.dylib
    0x7fff67398000 -     0x7fff673bffff  libsystem_malloc.dylib (283.100.6) <8549294E-4C53-36EB-99F3-584A7393D8D5> /usr/lib/system/libsystem_malloc.dylib
    0x7fff673c0000 -     0x7fff673cdffb  libsystem_networkextension.dylib (1095.140.2) <F06C65C5-2CBE-313C-96E1-A09240F9FE57> /usr/lib/system/libsystem_networkextension.dylib
    0x7fff673ce000 -     0x7fff673d7ff7  libsystem_notify.dylib (241.100.2) <FA22F928-D91B-3AA5-96BB-3186AC0FB264> /usr/lib/system/libsystem_notify.dylib
    0x7fff673d8000 -     0x7fff673e0fef  libsystem_platform.dylib (220.100.1) <009A7C1F-313A-318E-B9F2-30F4C06FEA5C> /usr/lib/system/libsystem_platform.dylib
    0x7fff673e1000 -     0x7fff673ebfff  libsystem_pthread.dylib (416.100.3) <62CB1A98-0B8F-31E7-A02B-A1139927F61D> /usr/lib/system/libsystem_pthread.dylib
    0x7fff673ec000 -     0x7fff673f0ff3  libsystem_sandbox.dylib (1217.141.2) <051C4018-4345-3034-AC98-6DE42FB8273B> /usr/lib/system/libsystem_sandbox.dylib
    0x7fff673f1000 -     0x7fff673f3fff  libsystem_secinit.dylib (62.100.2) <F80872AA-E1FD-3D7E-8729-467656EC6561> /usr/lib/system/libsystem_secinit.dylib
    0x7fff673f4000 -     0x7fff673fbffb  libsystem_symptoms.dylib (1238.120.2) <702D0910-5C34-3D43-9631-8BD215DE4FE1> /usr/lib/system/libsystem_symptoms.dylib
    0x7fff673fc000 -     0x7fff67412ff2  libsystem_trace.dylib (1147.120.1) <BC141783-66D9-3137-A783-211B38E49ADB> /usr/lib/system/libsystem_trace.dylib
    0x7fff67414000 -     0x7fff67419ff7  libunwind.dylib (35.4) <42B7B509-BAFE-365B-893A-72414C92F5BF> /usr/lib/system/libunwind.dylib
    0x7fff6741a000 -     0x7fff6744fffe  libxpc.dylib (1738.140.2) <54EEF402-42C7-3995-BADE-93C48EFC4452> /usr/lib/system/libxpc.dylib
Sample analysis of process 96294 written to file /dev/stdout
cherrymui commented 3 years ago

cc @mknyszek @ianlancetaylor

mknyszek commented 3 years ago

The backtrace is quite confusing, but it seems to be stuck in waiting for the world to start again?

ianlancetaylor commented 3 years ago

We can't trust lldb to provide a correct stack trace for a Go program. The stack trace that is shown is clearly impossible, and we don't know which parts are correct and which parts are not.

cherrymui commented 3 years ago

We can't trust lldb to provide a correct stack trace for a Go program. The stack trace that is shown is clearly impossible, and we don't know which parts are correct and which parts are not.

Everything below runtime.asmcgocall is probably incorrect.

anqurvanillapy commented 3 years ago

The backtrace seems repeating some of the callers after mallocgc, it could be split like:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff67326882 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff673e7425 libsystem_pthread.dylib`_pthread_cond_wait + 698
    frame #2: 0x00000001068c42f0 ngx_http_xxx_module.so`runtime.pthread_cond_wait_trampoline + 16
    frame #3: 0x00000001068c2250 ngx_http_xxx_module.so`runtime.asmcgocall + 112
    frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
    frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
    frame #6: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
    frame #7: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #8: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #9: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #10: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #11: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #12: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
    frame #13: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #14: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #15: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #16: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #17: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #18: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #19: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #20: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #21: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #22: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #23: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #24: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #25: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #26: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #27: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #28: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #29: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #30: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #31: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #32: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
    frame #33: 0x00000001068a5fa9 ngx_http_xxx_module.so`runtime.growslice + 489
    frame #34: 0x0000000106935216 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add + 886

So in my opnion this might be trusted?...

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff67326882 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff673e7425 libsystem_pthread.dylib`_pthread_cond_wait + 698
    frame #2: 0x00000001068c42f0 ngx_http_xxx_module.so`runtime.pthread_cond_wait_trampoline + 16
    frame #3: 0x00000001068c2250 ngx_http_xxx_module.so`runtime.asmcgocall + 112
    frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
    frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
    frame #6: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
    frame #7: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
    frame #8: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
    frame #9: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
    frame #10: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
    frame #11: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
cherrymui commented 3 years ago

No, it cannot. E.g.

 frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
 frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57

Clearly pthread_cond_wait does not call startTheWorldWithSema.

anqurvanillapy commented 3 years ago

No, it cannot. E.g.

 frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
 frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57

Clearly pthread_cond_wait does not call startTheWorldWithSema.

Oh, I see, it seems like the backtrace is messed with callstacks from many goroutines?...

Now I doubt that nginx blocks some threads where Cgo needs to do GC works so it hangs at the cond wait 🤔. I will try GODEBUG=gctrace=1, reading Go runtime source code and using some tools to verify the thread states later.