Closed gopherbot closed 9 years ago
Sure this is only ARM? On linux-386-387 we just got: http://build.golang.org/log/8bc3fb574de73d621f655b4a6ad27f920bfe67d1 runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 fatal error: unknown caller pc runtime stack: runtime.throw(0x8275671) /usr/local/go/src/pkg/runtime/panic.c:520 +0x71 runtime.gentraceback(0x80503ad, 0xb74250d0, 0x0, 0x18402e60, 0x0, 0x0, 0x7fffffff, 0x8064220, 0xbfc55498, 0x8063e00) /usr/local/go/src/pkg/runtime/traceback_x86.c:167 +0x667 copyabletopsegment(0x18402e60) /usr/local/go/src/pkg/runtime/stack.c:332 +0x95 runtime.newstack() /usr/local/go/src/pkg/runtime/stack.c:746 +0x3a4 runtime.morestack() /usr/local/go/src/pkg/runtime/asm_386.s:246 +0x68 goroutine 67 [stack growth]: runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 runtime.cgocallbackg() /usr/local/go/src/pkg/runtime/cgocall.c:244 +0x4d fp=0xb74250d8 created by testing.RunTests /usr/local/go/src/pkg/testing/testing.go:504 +0x6d0 goroutine 16 [chan receive]: testing.RunTests(0x81d2dc8, 0x82762a0, 0x2f, 0x2f, 0x806f001) /usr/local/go/src/pkg/testing/testing.go:505 +0x70c testing.Main(0x81d2dc8, 0x82762a0, 0x2f, 0x2f, 0x8272710, 0x1, 0x1, 0x828a060, 0x0, 0x0) /usr/local/go/src/pkg/testing/testing.go:435 +0x6e main.main() _/tmp/gobuilder/linux-386-387-238ff8c01b5b/go/misc/cgo/test/_test/_testmain.go:141 +0x86 goroutine 19 [finalizer wait]: runtime.park(0x805ec30, 0x82843e4, 0x82785e9) /usr/local/go/src/pkg/runtime/proc.c:1369 +0x94 runtime.parkunlock(0x82843e4, 0x82785e9) /usr/local/go/src/pkg/runtime/proc.c:1385 +0x3f runfinq() /usr/local/go/src/pkg/runtime/mgc0.c:2636 +0xc5 runtime.goexit() /usr/local/go/src/pkg/runtime/proc.c:1445 goroutine 20 [syscall]: os/signal.loop() /usr/local/go/src/pkg/os/signal/signal_unix.go:21 +0x24 created by os/signal.init·1 /usr/local/go/src/pkg/os/signal/signal_unix.go:27 +0x37 goroutine 17 [syscall]: runtime.cgocallbackg() /usr/local/go/src/pkg/runtime/cgocall.c:244 +0x4d runtime.cgocallback_gofunc(0x0, 0x0, 0x0) /usr/local/go/src/pkg/runtime/asm_386.s:690 +0x5a runtime.goexit() /usr/local/go/src/pkg/runtime/proc.c:1445 goroutine 50 [syscall]: runtime.goexit() /usr/local/go/src/pkg/runtime/proc.c:1445 exit status 2 FAIL _/tmp/gobuilder/linux-386-387-238ff8c01b5b/go/misc/cgo/test 2.167s Build complete, duration 13m16.183272956s. Result: error: exit status 1
So the crash is happening for me in gentraceback. I added some prints and stuff to the code to try to debug further. Here is one such output: starting gentraceback(0xffffffff, 0xffffffff, 0x0, 0x10602640, 0, 0x0, 2147483647, 0x19324, 0x41db8bd0, 0) used from sycallstack=0x404b3fd0 pc0=0x14d24 sp0=0x404b3ee8 pc=0x14d24 lr=0x41f9c fp=0x404b3ef0 sp=0x404b3ee8 argp=0x404b3ef4 arglen=0 runtime.cgocallbackg pc=0x41f9c lr=0x41ee4 fp=0x404b3efc sp=0x404b3ef0 argp=0x404b3f00 arglen=12 runtime.cgocallback_gofunc pc=0x41ee4 lr=0x26954 fp=0x404b3f00 sp=0x404b3efc argp=0x404b3f04 arglen=8 runtime.asmcgocall pc=0x26954 lr=0x11629c fp=0x404b3f00 sp=0x404b3f00 argp=0x404b3f04 arglen=4 exitsyscall0 pc=0x11629c lr=0x11629c fp=0x404b3f14 sp=0x404b3f00 argp=0x404b3f18 arglen=16 github.com/spacemonkeygo/openssl._C2func_SSL_do_handshake pc=0x11629c lr=0x404b3f01 fp=0x404b3f28 sp=0x404b3f14 I'm unable to find a shorter reproduction still, but I think this happens when a gentraceback is called on a goroutine that is returning from a C->Go callback in the cgocallbackg function right before/after it hits entersyscall. I have no idea if this theory is possible or makes sense. I have also discovered that if the bio free does not call back into Go, the repro fails to fail. Since this clearly doesn't fix the underlying cause, I still think there's a bug on the Go side of things here (it doesn't look like there's stack corruption from the C code, but I could be way wrong there too). This commit shows the change: https://github.com/spacemonkeygo/openssl/commit/05477109fd870990155ef0806ab04e91312412f Hopefully this information helps, and the offer to set up a debug environment or to provide any information I can still stands. Any suggestions on where to look or strategies used to debug this stuff I'd also gladly accept :)
I'm having the same issues. Program completes successfully on 1.2.2 but crashes with above message on tip. This is OSX, 10.8.5... I'm working towards a shorter reproducible case but can't promise much. if you have any ideas how to help debug this please share. runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 fatal error: unknown caller pc runtime stack: runtime.throw(0x4aab451) /Users/aam/go/src/pkg/runtime/panic.c:523 +0x77 fp=0xb0621ba0 sp=0xb0621b88 runtime.gentraceback(0x473a8a3, 0x4ebcd40, 0x0, 0xc208064b40, 0x0, 0x0, 0xaa8280007fffffff, 0x473edc0, 0xb0621d78, 0xaa8280000596c000) /Users/aam/go/src/pkg/runtime/traceback_x86.c:180 +0xbb7 fp=0xb0621c88 sp=0xb0621ba0 addstackroots(0xc208064b40, 0xb0621d78) /Users/aam/go/src/pkg/runtime/mgc0.c:1658 +0x14a fp=0xb0621d08 sp=0xb0621c88 markroot(0xc208018480, 0xf) /Users/aam/go/src/pkg/runtime/mgc0.c:1296 +0xbe fp=0xb0621d88 sp=0xb0621d08 runtime.parfordo(0xc208018480) /Users/aam/go/src/pkg/runtime/parfor.c:103 +0x128 fp=0xb0621e00 sp=0xb0621d88 runtime.gchelper() /Users/aam/go/src/pkg/runtime/mgc0.c:2084 +0x46 fp=0xb0621e20 sp=0xb0621e00 stopm() /Users/aam/go/src/pkg/runtime/proc.c:961 +0x149 fp=0xb0621e38 sp=0xb0621e20 startlockedm(0xc208064b40) /Users/aam/go/src/pkg/runtime/proc.c:1104 +0xa8 fp=0xb0621e58 sp=0xb0621e38 schedule() /Users/aam/go/src/pkg/runtime/proc.c:1356 +0x127 fp=0xb0621e80 sp=0xb0621e58 park0(0xc208002120) /Users/aam/go/src/pkg/runtime/proc.c:1414 +0x133 fp=0xb0621ea0 sp=0xb0621e80 runtime.mcall(0x47333cd) /Users/aam/go/src/pkg/runtime/asm_amd64.s:183 +0x52 fp=0xb0621eb0 sp=0xb0621ea0 goroutine 16 [runnable]: syscall.Syscall6(0x158, 0xa, 0xc20817e000, 0x1000, 0xc208176b68, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/aam/go/src/pkg/syscall/asm_darwin_amd64.s:44 +0x5 fp=0xc208176ac0 sp=0xc208176ab8 syscall.Getdirentries(0xa, 0xc20817e000, 0x1000, 0x1000, 0xc208176b68, 0x10, 0x0, 0x0) /Users/aam/go/src/pkg/syscall/zsyscall_darwin_amd64.go:560 +0x97 fp=0xc208176b28 sp=0xc208176ac0 syscall.ReadDirent(0xa, 0xc20817e000, 0x1000, 0x1000, 0xc208034680, 0x0, 0x0) /Users/aam/go/src/pkg/syscall/syscall_bsd.go:71 +0x78 fp=0xc208176b80 sp=0xc208176b28 os.(*File).readdirnames(0xc2081ea018, 0xfffffffffffffffc, 0xc208034680, 0x3, 0x64, 0x0, 0x0) /Users/aam/go/src/pkg/os/dir_unix.go:39 +0x1c9 fp=0xc208176c60 sp=0xc208176b80 os.(*File).Readdirnames(0xc2081ea018, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/aam/go/src/pkg/os/doc.go:134 +0xa5 fp=0xc208176ca0 sp=0xc208176c60 os.(*File).readdir(0xc2081ea018, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/aam/go/src/pkg/os/file_unix.go:158 +0xca fp=0xc208176da8 sp=0xc208176ca0 os.(*File).Readdir(0xc2081ea018, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/aam/go/src/pkg/os/doc.go:115 +0xa5 fp=0xc208176de8 sp=0xc208176da8 io/ioutil.ReadDir(0xc20818a200, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/aam/go/src/pkg/io/ioutil/ioutil.go:105 +0xed fp=0xc208176e70 sp=0xc208176de8
Comment 9 by hector@marcansoft.com:
We're seeing this too, on linux-amd64, again on cgo code with callbacks back into go (in this case, a libjpeg wrapper). I don't have a good test case yet though. I've abbreviated the dump; this happened during load testing of a server so there are lots of goroutines alive. runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 fatal error: unknown caller pc runtime stack: runtime.throw(0xa695d1) /usr/local/go/src/pkg/runtime/panic.c:520 +0x69 fp=0x7fe4d5642b60 sp=0x7fe4d5642b48 runtime.gentraceback(0x406f0c, 0xc20821d7a8, 0x0, 0xc20833e6c0, 0x7fe400000000, 0x0, 0x7fffffff, 0x40b4c0, 0x7fe4d5642d48, 0x2ac0ffe4da1ce000) /usr/local/go/src/pkg/runtime/traceback_x86.c:180 +0xba9 fp=0x7fe4d5642c48 sp=0x7fe4d5642b60 addstackroots(0xc20833e6c0, 0x7fe4d5642d48) /usr/local/go/src/pkg/runtime/mgc0.c:1692 +0x168 fp=0x7fe4d5642cd8 sp=0x7fe4d5642c48 markroot(0xc208018480, 0x89) /usr/local/go/src/pkg/runtime/mgc0.c:1321 +0xbe fp=0x7fe4d5642d58 sp=0x7fe4d5642cd8 runtime.parfordo(0xc208018480) /usr/local/go/src/pkg/runtime/parfor.c:103 +0x128 fp=0x7fe4d5642dd0 sp=0x7fe4d5642d58 runtime.gchelper() /usr/local/go/src/pkg/runtime/mgc0.c:2118 +0x3f fp=0x7fe4d5642df0 sp=0x7fe4d5642dd0 stopm() /usr/local/go/src/pkg/runtime/proc.c:957 +0x111 fp=0x7fe4d5642e08 sp=0x7fe4d5642df0 gcstopm() /usr/local/go/src/pkg/runtime/proc.c:1122 +0xc4 fp=0x7fe4d5642e28 sp=0x7fe4d5642e08 schedule() /usr/local/go/src/pkg/runtime/proc.c:1321 +0x4a fp=0x7fe4d5642e50 sp=0x7fe4d5642e28 park0(0xc20833e6c0) /usr/local/go/src/pkg/runtime/proc.c:1410 +0xfe fp=0x7fe4d5642e70 sp=0x7fe4d5642e50 runtime.mcall(0x402a94) /usr/local/go/src/pkg/runtime/asm_amd64.s:181 +0x4b fp=0x7fe4d5642e80 sp=0x7fe4d5642e70 [...] goroutine 9629 [runnable]: runtime.cgocallbackg() /usr/local/go/src/pkg/runtime/cgocall.c:244 +0x4c fp=0xc2087117b8 sp=0xc2087117a8 runtime.cgocallback_gofunc(0x406d92, 0x402660, 0xc208711830) /usr/local/go/src/pkg/runtime/asm_amd64.s:785 +0x67 fp=0xc2087117c8 sp=0xc2087117b8 exitsyscall0(0x402660) /usr/local/go/src/pkg/runtime/proc.c:1674 fp=0xc2087117d0 sp=0xc2087117c8 runtime.cgocall(0x402660, 0xc208711830) /usr/local/go/src/pkg/runtime/cgocall.c:144 +0x102 fp=0xc208711818 sp=0xc2087117d0 github.com/pixiv/go-thumber/jpeg._Cfunc_jpeg_read_raw_data(0xc2080bf8c0, 0xc20800e720, 0x4, 0xc200000002) github.com/pixiv/go-thumber/jpeg/_obj/_cgo_defun.c:172 +0x31 fp=0xc208711830 sp=0xc208711818 [...] goroutine 9665 [runnable]: runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 runtime.cgocallbackg() /usr/local/go/src/pkg/runtime/cgocall.c:244 +0x4c fp=0xc20821d7b8 sp=0xc20821d7a8 created by net/http.(*Server).Serve /usr/local/go/src/pkg/net/http/server.go:1721 +0x313 [...] Note how a stacktrace of goroutine 9629 seems to work, while 9665 fails even though they both seem to be in the same or a very similar code path (the low bits of SP match...).
Comment 10 by marcan@pixiv.co.jp:
I'm pretty sure I figured out what's going on. The tl;dr version is that runtime.cgocallbackg doesn't play nice with runtime.entersyscall and runtime.exitsyscall. They assume that the stack frame that called entersyscall is valid until exitsyscall is called, which is not the case when returning from a C->Go callback. entersyscall saves the caller's SP for use by the GC/traceback functions, but by the time exitsyscall is called that frame is no longer valid. This is a diagram of what I think happens: * = g->syscallsp g stack ========> g0 stack =======> Go [ReadJPEG()] runtime.cgocall(...) [C.jpeg_read_raw_data] *runtime.entersyscall() *runtime.asmcgocall(...) * === STACK SWITCH === * _cgo_C...() * c_func() [jpeg_read_raw_data()] * GoF() [mgr->fill_input_buffer() --> sourceFill()] * crosscall2(...) * _cgoexp_GoF() * runtime.cgocallback(...) * runtime.cgocallback_gofunc(...) * === STACK SWITCH === * runtime.cgocallbackg() * runtime.exitsyscall() runtime.cgocallbackg1() runtime.newstackcall(...) === STACK SWITCH === runtime.newstack(...) runtime.gogo(...) === STACK SWITCH === +> stack segment boundary | pkg.GoF() [jpeg.sourceFill()] +< <-runtime.newstackcall <-runtime.cgocallbackg1 / *runtime.entersyscall() | <*runtime.cgocallbackg | =*= STACK SWITCH === | * <-runtime.cgocallback_gofunc | * <-runtime.cgocallback | * <-_cgoexp_GoF | * <-crosscall2 | * <-GoF | * <-c_func [jpeg_read_raw_data] | * <-_cgo_C... | * === STACK SWITCH === | <-r*ntime.asmcgocall |\ run*ime.exitsyscall() <- actual crash happens here (1) | <-runtime.cgocall() | +- danger area, g->syscallsp < SP (1) exitsyscall calls runtime.mcall before setting syscallsp to nil. runtime.mcall yields. Eventually the GC gets called, tries to trace the stack starting at g->syscallsp, which is now garbage. This is a raw stack dump that I got while reproducing the original issue with a patched traceback function: goroutine 55430 [runnable]: runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 pc0=0x4081fc sp0=0xc208855d70 lr0=0x0 gp->status=1 gp->syscallstack=0xc208859fa8 gp->syscallsp=0xc208855d70 gp->syscallpc=0x4081fc gp->syscallguard=0xc208852100 gp->sched.sp=0xc208855d88 gp->sched.pc=0x417eec gp->stackbase=0xc208859fa8 gp->stackguard=0xc208852100 gp->sched.ctxt=0x0 stack dump: 0xc208855cf0: 0x411f37 (runtime.futexsleep+0x0x47) 0xc208855cf8: 0xa84400 0xc208855d00: 0x200000000 0xc208855d08: 0x412002 (runtime.futexwakeup+0x0x52) 0xc208855d10: 0xa84400 0xc208855d18: 0x100000001 0xc208855d20: 0x0 0xc208855d28: 0x0 0xc208855d30: 0x0 0xc208855d38: 0xa84400 0xc208855d40: 0x408668 (runtime.unlock+0x0x68) 0xc208855d48: 0xa84400 0xc208855d50: 0x200000001 0xc208855d58: 0x200408483 0xc208855d60: 0x417fec (exitsyscallfast+0x0xbc) 0xc208855d68: 0xa84400 0xc208855d70: 0xc20806c00c <=== SP 0xc208855d78: 0x0 0xc208855d80: 0x417eec (runtime.exitsyscall+0x0x10c) <=== FP 0xc208855d88: 0x4180c0 (exitsyscall0+0x0x0) 0xc208855d90: 0x408082 (runtime.cgocall+0x0x102) 0xc208855d98: 0x402720 0xc208855da0: 0xc208855df8 0xc208855da8: 0x7f0100000000 0xc208855db0: 0xffffffffffffffff 0xc208855db8: 0x19000 0xc208855dc0: 0xa660d8 0xc208855dc8: 0xc2080bac30 0xc208855dd0: 0x42836b (runtime.new+0x0x3b) 0xc208855dd8: 0x566311 (github.com/pixiv/go-thumber/jpeg._Cfunc_jpeg_read_raw_data+0x0x31) 0xc208855de0: 0x402720 0xc208855de8: 0xc208855df8 0xc208855df0: 0x5635c0 (github.com/pixiv/go-thumber/jpeg.ReadJPEG+0x0x770) Hence we see a traceback where the tracer thinks it's in cgocallbackg (which called entersyscall), but the actual stack shows runtime.cgocall -> runtime.exitsyscall -> runtime.mcall, and the traceback routine is very confused since it can't find the LR in the right place. Note how gp->sched.sp > gp->syscallsp. I'm dumping some stack before the SP too, but those are invalid/leftover stack frames, so take them with a grain of salt. I've attached a patch that adds a sanity check to exitsyscall that panics if this ever happens, and a tentative fix to runtime.cgocallback that saves the syscallsp and restores it after entersyscall. I'm not sure if this is sufficient, though; I suspect it also needs to be restored in a defer in case of a panic. I didn't see any panic handling code in entersyscall/exitsyscall themselves so I'm not sure (haven't checked the normal syscall path yet). The sanity check will go off on every c->go callback if patched in alone, so that's an easy way to catch the issue without having to wait for the stars to align and the GC to get called in the right place and cause a crash. (Note: I've just signed the Individual CLA in case you want to use the patch; I assume it might take a while to get processed though.)
Attachments:
Comment 11 by hector@marcansoft.com:
I'm pretty sure I figured out what's going on. The tl;dr version is that runtime.cgocallbackg doesn't play nice with runtime.entersyscall and runtime.exitsyscall. They assume that the stack frame that called entersyscall is valid until exitsyscall is called, which is not the case when returning from a C->Go callback. entersyscall saves the caller's SP for use by the GC/traceback functions, but by the time exitsyscall is called that frame is no longer valid. This is a diagram of what I think happens: * = g->syscallsp g stack ========> g0 stack =======> Go [ReadJPEG()] runtime.cgocall(...) [C.jpeg_read_raw_data] *runtime.entersyscall() *runtime.asmcgocall(...) * === STACK SWITCH === * _cgo_C...() * c_func() [jpeg_read_raw_data()] * GoF() [mgr->fill_input_buffer() --> sourceFill()] * crosscall2(...) * _cgoexp_GoF() * runtime.cgocallback(...) * runtime.cgocallback_gofunc(...) * === STACK SWITCH === * runtime.cgocallbackg() * runtime.exitsyscall() runtime.cgocallbackg1() runtime.newstackcall(...) === STACK SWITCH === runtime.newstack(...) runtime.gogo(...) === STACK SWITCH === +> stack segment boundary | pkg.GoF() [jpeg.sourceFill()] +< <-runtime.newstackcall <-runtime.cgocallbackg1 / *runtime.entersyscall() | <*runtime.cgocallbackg | =*= STACK SWITCH === | * <-runtime.cgocallback_gofunc | * <-runtime.cgocallback | * <-_cgoexp_GoF | * <-crosscall2 | * <-GoF | * <-c_func [jpeg_read_raw_data] | * <-_cgo_C... | * === STACK SWITCH === | <-r*ntime.asmcgocall |\ run*ime.exitsyscall() <- actual crash happens here (1) | <-runtime.cgocall() | +- danger area, g->syscallsp < SP (1) exitsyscall calls runtime.mcall before setting syscallsp to nil. runtime.mcall yields. Eventually the GC gets called, tries to trace the stack starting at g->syscallsp, which is now garbage. This is a raw stack dump that I got while reproducing the original issue with a patched traceback function: goroutine 55430 [runnable]: runtime: unexpected return pc for runtime.cgocallbackg called from 0x0 pc0=0x4081fc sp0=0xc208855d70 lr0=0x0 gp->status=1 gp->syscallstack=0xc208859fa8 gp->syscallsp=0xc208855d70 gp->syscallpc=0x4081fc gp->syscallguard=0xc208852100 gp->sched.sp=0xc208855d88 gp->sched.pc=0x417eec gp->stackbase=0xc208859fa8 gp->stackguard=0xc208852100 gp->sched.ctxt=0x0 stack dump: 0xc208855cf0: 0x411f37 (runtime.futexsleep+0x0x47) 0xc208855cf8: 0xa84400 0xc208855d00: 0x200000000 0xc208855d08: 0x412002 (runtime.futexwakeup+0x0x52) 0xc208855d10: 0xa84400 0xc208855d18: 0x100000001 0xc208855d20: 0x0 0xc208855d28: 0x0 0xc208855d30: 0x0 0xc208855d38: 0xa84400 0xc208855d40: 0x408668 (runtime.unlock+0x0x68) 0xc208855d48: 0xa84400 0xc208855d50: 0x200000001 0xc208855d58: 0x200408483 0xc208855d60: 0x417fec (exitsyscallfast+0x0xbc) 0xc208855d68: 0xa84400 0xc208855d70: 0xc20806c00c <=== SP 0xc208855d78: 0x0 0xc208855d80: 0x417eec (runtime.exitsyscall+0x0x10c) <=== FP 0xc208855d88: 0x4180c0 (exitsyscall0+0x0x0) 0xc208855d90: 0x408082 (runtime.cgocall+0x0x102) 0xc208855d98: 0x402720 0xc208855da0: 0xc208855df8 0xc208855da8: 0x7f0100000000 0xc208855db0: 0xffffffffffffffff 0xc208855db8: 0x19000 0xc208855dc0: 0xa660d8 0xc208855dc8: 0xc2080bac30 0xc208855dd0: 0x42836b (runtime.new+0x0x3b) 0xc208855dd8: 0x566311 (github.com/pixiv/go-thumber/jpeg._Cfunc_jpeg_read_raw_data+0x0x31) 0xc208855de0: 0x402720 0xc208855de8: 0xc208855df8 0xc208855df0: 0x5635c0 (github.com/pixiv/go-thumber/jpeg.ReadJPEG+0x0x770) Hence we see a traceback where the tracer thinks it's in cgocallbackg (which called entersyscall), but the actual stack shows runtime.cgocall -> runtime.exitsyscall -> runtime.mcall, and the traceback routine is very confused since it can't find the LR in the right place. Note how gp->sched.sp > gp->syscallsp. I'm dumping some stack before the SP too, but those are invalid/leftover stack frames, so take them with a grain of salt. I've attached a patch that adds a sanity check to exitsyscall that panics if this ever happens, and a tentative fix to runtime.cgocallback that saves the syscallsp and restores it after entersyscall. I'm not sure if this is sufficient, though; I suspect it also needs to be restored in a defer in case of a panic. I didn't see any panic handling code in entersyscall/exitsyscall themselves so I'm not sure (haven't checked the normal syscall path yet). The sanity check will go off on every c->go callback if patched in alone, so that's an easy way to catch the issue without having to wait for the stars to align and the GC to get called in the right place and cause a crash. (Note: I've just signed the Individual CLA in case you want to use the patch; I assume it might take a while to get processed though.)
Attachments:
Thanks for the efforts to debug this issue. We cannot accept patches via the issue tracker, please use the code review tool described in the http://golang.org/doc/contribute.html
Status changed to Started.
Comment 13 by hector@marcansoft.com:
CL sent for review: https://golang.org/cl/131910043/ Upon thinking about it for a bit, I don't think any special handling for panics is needed, since AFAICT those should only happen in go code that isn't "in a syscall" (and should unwind skipping any C code/stack paths).
CL https://golang.org/cl/131910043 mentions this issue.
CL https://golang.org/cl/152970043 mentions this issue.
by leterip: