Closed anqurvanillapy closed 1 year ago
cc @mknyszek @ianlancetaylor
The backtrace is quite confusing, but it seems to be stuck in waiting for the world to start again?
We can't trust lldb to provide a correct stack trace for a Go program. The stack trace that is shown is clearly impossible, and we don't know which parts are correct and which parts are not.
We can't trust lldb to provide a correct stack trace for a Go program. The stack trace that is shown is clearly impossible, and we don't know which parts are correct and which parts are not.
Everything below runtime.asmcgocall is probably incorrect.
The backtrace seems repeating some of the callers after mallocgc
, it could be split like:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff67326882 libsystem_kernel.dylib`__psynch_cvwait + 10
frame #1: 0x00007fff673e7425 libsystem_pthread.dylib`_pthread_cond_wait + 698
frame #2: 0x00000001068c42f0 ngx_http_xxx_module.so`runtime.pthread_cond_wait_trampoline + 16
frame #3: 0x00000001068c2250 ngx_http_xxx_module.so`runtime.asmcgocall + 112
frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
frame #6: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
frame #7: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
frame #8: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
frame #9: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #10: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #11: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #12: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
frame #13: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
frame #14: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
frame #15: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #16: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #17: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #18: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
frame #19: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
frame #20: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #21: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #22: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #23: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
frame #24: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #25: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #26: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #27: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #28: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #29: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #30: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #31: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #32: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
---
frame #33: 0x00000001068a5fa9 ngx_http_xxx_module.so`runtime.growslice + 489
frame #34: 0x0000000106935216 ngx_http_xxx_module.so`github.com/antlr/antlr4/runtime/Go/antlr.(*BaseATNConfigSet).Add + 886
So in my opnion this might be trusted?...
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff67326882 libsystem_kernel.dylib`__psynch_cvwait + 10
frame #1: 0x00007fff673e7425 libsystem_pthread.dylib`_pthread_cond_wait + 698
frame #2: 0x00000001068c42f0 ngx_http_xxx_module.so`runtime.pthread_cond_wait_trampoline + 16
frame #3: 0x00000001068c2250 ngx_http_xxx_module.so`runtime.asmcgocall + 112
frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576
frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
frame #6: 0x000000010688c44d ngx_http_xxx_module.so`runtime.semasleep + 141
frame #7: 0x000000010686733b ngx_http_xxx_module.so`runtime.notetsleep_internal + 187
frame #8: 0x0000000106867625 ngx_http_xxx_module.so`runtime.notetsleepg + 101
frame #9: 0x0000000106877d10 ngx_http_xxx_module.so`runtime.gcBgMarkStartWorkers + 80
frame #10: 0x00000001068766ea ngx_http_xxx_module.so`runtime.gcStart + 458
frame #11: 0x0000000106868bf5 ngx_http_xxx_module.so`runtime.mallocgc + 1173
No, it cannot. E.g.
frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576 frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
Clearly pthread_cond_wait does not call startTheWorldWithSema.
No, it cannot. E.g.
frame #4: 0x00000001068951e0 ngx_http_xxx_module.so`runtime.startTheWorldWithSema + 576 frame #5: 0x00000001068ae639 ngx_http_xxx_module.so`runtime.pthread_cond_wait + 57
Clearly pthread_cond_wait does not call startTheWorldWithSema.
Oh, I see, it seems like the backtrace is messed with callstacks from many goroutines?...
Now I doubt that nginx blocks some threads where Cgo needs to do GC works so it hangs at the cond wait 🤔. I will try GODEBUG=gctrace=1
, reading Go runtime source code and using some tools to verify the thread states later.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I wrote an nginx module that receives a JSON request and handles it with Cgo functions. In the Cgo handler, I also used an ANTLR4-generated parser to parse some strings. The grammar is a bit complicated and it's doing a lot of recursive works.
It's weird that the parser could always give the results for the first time (receiving the HTTP response), but for the second time, it hangs at the
mallocgc
forever.Below is the partial backtrace from
lldb
.It seems like
mallocgc
triggers a sleep with a negative duration so it calls thepthread_cond_wait
in functionsemasleep
, and it hangs.What did you expect to see?
My program could safely pass through the
mallocgc
😢What did you see instead?
Partial backtrace via
lldb
:The process sample: