Closed zzkcode closed 11 months ago
For GDB, are you running the program in GDB, or you're using GDB to debug a core dump? For the former, GDB should stop when it first gets a SIGSEGV, before the Go signal handler runs. You should be able to get a backtrace from there.
PC=0x46bd10
For the stderr output, it prints the PC where the fault happens, so you at least can see which function it is, although not a full stack trace.
Hi @cherrymui, thanks for your comment. I'm using GDB to debug a core dump. And let's be more clearly, I want to catch the right stack backtrace in the real application's core dump, so I do this for a verification. Regarding to using PC to see what really happen in a real application may not be enough without a correct backtrace.
I think it's related to use ucontext in CGO, since if I do not bind function with ucontext, just call core
C function directly instead of calling core_logic
, the stack backtrace would be fine.
It may has something related to this issue https://github.com/golang/go/issues/62130? However, gotip version does not fix this. FYI @ianlancetaylor
package main
/*
#cgo CFLAGS: -g -O0
#include <stdio.h>
#include <stddef.h>
#include <ucontext.h>
#include <stdlib.h>
#include <signal.h>
static ucontext_t uctx_main, uctx_core;
void core()
{
// core logic
// trigger crash
int* ptr = NULL;
*ptr = 1024;
}
void core_logic()
{
size_t size = 1024 * 1024;
char stack[size]; // SIGSEGV
//void* stack = malloc(size); // SIGTRAP
if (getcontext(&uctx_core) == -1)
printf("failed to getcontext");
uctx_core.uc_stack.ss_sp = stack;
uctx_core.uc_stack.ss_size = size;
uctx_core.uc_link = &uctx_main;
makecontext(&uctx_core, core, 0);
if (swapcontext(&uctx_main, &uctx_core) == -1)
printf("failed to swapcontext");
printf("back\n");
}
*/
import "C"
import "runtime/debug"
func coreLogic() {
C.core() // call core directly
}
func main() {
debug.SetTraceback("crash")
// Call the C function from Go
coreLogic()
}
@golang/runtime
Just to be sure I understand correctly, the issue you are reporting that that in 1.18, from a core file gdb shows the faulting function in a backtrace, while in 1.21 gdb does not show the faulting function?
It is not about the panic output from the runtime (which in your example IMO is better in 1.21 than 1.18).
Hi @prattmic, thanks for your comment.
Just to be sure I understand correctly, the issue you are reporting that that in 1.18, from a core file gdb shows the faulting function in a backtrace, while in 1.21 gdb does not show the faulting function?
Yes, exactly! I suspect that it's related to using ucontext
in cgo, since I call C.core
function directly instead of calling C.core_logic
in go1.21.1, then the stack backtrace will be correct. The C.core_logic
just use ucontext to bind core
function.
It is not about the panic output from the runtime (which in your example IMO is better in 1.21 than 1.18).
The panic output in the first few lines has more or less show something output during it's execution in runtime. Everytime I see a correct backtrace, I would see a runtime stack
output, which I believe it's outputted in panic.go's dopanic_m
.
output with a runtime stack:
runtime stack:
runtime.throw({0x486e64?, 0x40bb79?})
/usr/local/go/src/runtime/panic.go:992 +0x71 fp=0x7ffd80fa4310 sp=0x7ffd80fa42e0 pc=0x4339b1
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:802 +0x225 fp=0x7ffd80fa4340 sp=0x7ffd80fa4310 pc=0x447de5
master brach: https://github.com/golang/go/blob/master/src/runtime/panic.go#L1334
// src/runtime/panic.go
} else if level >= 2 || gp.m.throwing >= throwTypeRuntime {
print("\nruntime stack:\n")
traceback(pc, sp, 0, gp)
}
Updated: I did a test and it's still working on go1.20.8. So it may be broken in go1.21.1 go1.21.0? FYI @prattmic
More update: It seems like the thread that caused the crash was never the number one thread in GDB, it may be harder to find out in a real application, since it may have hundreds of threads. FYI @prattmic @cherrymui
Steps to reproduce under go1.21.1:
show all threads, and switch to every thread to check the backtrace, finally, it turns out the thread id 6 is the one that caused the crash.
(gdb) i threads
Id Target Id Frame
* 1 Thread 0x7ff426ffd700 (LWP 5269) runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:154
2 Thread 0x7ff42ca17700 (LWP 5266) runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
3 Thread 0x7ff4277fe700 (LWP 5268) runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
4 Thread 0x7ff474438740 (LWP 5264) runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
5 Thread 0x7ff42d218700 (LWP 5265) runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
6 Thread 0x7ff427fff700 (LWP 5267) runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
(gdb) t 6
[Switching to thread 6 (Thread 0x7ff427fff700 (LWP 5267))]
#0 runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
135 RET
(gdb) bt
#0 runtime.usleep () at /usr/local/go/src/runtime/sys_linux_amd64.s:135
#1 0x000000000044f473 in runtime.sighandler (sig=11, info=<optimized out>, ctxt=<optimized out>, gp=0xc000007ba0) at /usr/local/go/src/runtime/signal_unix.go:769
#2 0x000000000044ec11 in runtime.sigtrampgo (sig=11, info=0xc000087bf0, ctx=0xc000087ac0) at /usr/local/go/src/runtime/signal_unix.go:490
#3 0x000000000046aa66 in runtime.sigtramp () at /usr/local/go/src/runtime/sys_linux_amd64.s:352
#4 <signal handler called>
#5 0x000000000049c340 in core () at /container_share/works/badstack/main.go:19
#6 0x00007ff473c52150 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /lib64/libc.so.6
#7 0x0000000000765f80 in uctx_core ()
#8 0x0000000000440551 in runtime.execute (gp=0x49c459 <_cgo_96a711317223_Cfunc_core_logic+25>, inheritTime=184) at /usr/local/go/src/runtime/proc.go:2884
#9 0x00007ff427ffee10 in ?? ()
#10 0x000000000049c459 in _cgo_96a711317223_Cfunc_core_logic (v=0x100000) at cgo-gcc-prolog:49
Backtrace stopped: frame did not save the PC
Btw, we can try dlv and then gdb since we already know the bad goroutine id from the stdout/err:
However, this is not so intuitive to me since the crash is caused by cgo code, we may usually use GDB to debug it at the very first.
ps: as I mentioned before, this issue happens on using ucontext in cgo since go1.21.1 go1.21.0.
Thanks. The signal on the "correct" thread seems to land at https://cs.opensource.google/go/go/+/refs/tags/go1.21.1:src/runtime/signal_unix.go;l=756-769, which is re-raising the signal when it thinks it is already crashing and another thread is dumping a stack trace.
Maybe that M count at https://cs.opensource.google/go/go/+/refs/tags/go1.21.1:src/runtime/signal_unix.go;l=756 isn't quite right and this is actually the first thread crashing? Could you print in GDB the value of mcount()
and *extraMLength
? mcount()
is sched.mnext - sched.nmfreed
. crashing
is probably 1 if this is the first thread crashing.
@cherrymui Thanks. It seems to be exactly what you are saying.
(GDB could not print out some of them, so I used dlv instead.)
In go1.20.8, it looks like the program panic starts from go1.20.8-signal_unix.go#L675, return, and never runs the rest of the code in sighandler
. In go1.21.1, however, it will pass by and run docrash block starts from go1.21.1-signal_unix.go#L754. crashing
is increased to 1, mcount()
is 8, extraMLength
is 1(details pasted as below), so it will indeed raise the signal SIGQUIT in the first thread which caused SIGSEGV?
(dlv) p sig
11
(dlv) n
> runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:756 (PC: 0x44a9ae)
Warning: debugging optimized function
751: dumpregs(c)
752: }
753:
754: if docrash {
755: crashing++
=> 756: if crashing < mcount()-int32(extraMLength.Load()) {
757: // There are other m's that need to dump their stacks.
758: // Relay SIGQUIT to the next m by sending it to the current process.
759: // All m's that have already received SIGQUIT have signal masks blocking
760: // receipt of any signals, so the SIGQUIT will go to an m that hasn't seen it yet.
761: // When the last m receives the SIGQUIT, it will fall through to the call to
(dlv) p crashing
1
(dlv) p sched.mnext
8
(dlv) p sched.nmfreed
0
(dlv) p extraMLength
runtime/internal/atomic.Uint32 {
noCopy: runtime/internal/atomic.noCopy {},
value: 1,}
Since in go1.21.1, it just passes by the preparePanic
starts from here, I check the condition statements(shown below). (Unfortunately, I can directly print c.info.si_code
out even though I did use -gcflags="all=-N -l"
to build it.) But I find it also prints out in the stdout in fatalsignal
, which shows that the sigcode
is 0, and will return true from sigFromUser
and make !c.sigFromUser()
always false. I don't know whether I debug this correctly, but please let me know whatever useful I can provide.
(dlv) n
> runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:690 (PC: 0x44a794)
Warning: debugging optimized function
685: if isAbortPC(c.sigpc()) {
686: // On many architectures, the abort function just
687: // causes a memory fault. Don't turn that into a panic.
688: flags = _SigThrow
689: }
=> 690: if !c.sigFromUser() && flags&_SigPanic != 0 {
691: // The signal is going to cause a panic.
692: // Arrange the stack so that it looks like the point
693: // where the signal occurred made a call to the
694: // function sigpanic. Then set the PC to sigpanic.
695:
(dlv) p flags
4
(dlv) p _SigPanic
8
(dlv) p c.info.si_code
(unreadable empty OP stack)
(dlv) c
SIGQUIT: quit
PC=0x464881 m=3 sigcode=0
goroutine 0 [idle]:
runtime.futex()
/usr/local/go/src/runtime/sys_linux_amd64.s:557 +0x21 fp=0x7fbfba4bdbf8 sp=0x7fbfba4bdbf0 pc=0x464881
runtime.futexsleep(0x40b5be?, 0xba4bdc78?, 0x40bc13?)
/usr/local/go/src/runtime/os_linux.go:69 +0x30 fp=0x7fbfba4bdc58 sp=0x7fbfba4bdbf8 pc=0x432250
runtime.notesleep(0xc000058948)
/usr/local/go/src/runtime/lock_futex.go:160 +0x9d fp=0x7fbfba4bdc90 sp=0x7fbfba4bdc58 pc=0x40b7dd
runtime.mPark()
/usr/local/go/src/runtime/proc.go:1632 +0x1e fp=0x7fbfba4bdcb0 sp=0x7fbfba4bdc90 pc=0x43b31e
runtime.stopm()
/usr/local/go/src/runtime/proc.go:2536 +0x6d fp=0x7fbfba4bdcd8 sp=0x7fbfba4bdcb0 pc=0x43c84d
runtime.findRunnable()
/usr/local/go/src/runtime/proc.go:3229 +0x30 fp=0x7fbfba4bddf8 sp=0x7fbfba4bdcd8 pc=0x43d3d0
runtime.schedule()
/usr/local/go/src/runtime/proc.go:3582 +0xbd fp=0x7fbfba4bde30 sp=0x7fbfba4bddf8 pc=0x43eadd
runtime.park_m(0xc000007a00?)
/usr/local/go/src/runtime/proc.go:3745 +0x105 fp=0x7fbfba4bde60 sp=0x7fbfba4bde30 pc=0x43f045
runtime.mcall()
/usr/local/go/src/runtime/asm_amd64.s:458 +0x4e fp=0x7fbfba4bde78 sp=0x7fbfba4bde60 pc=0x460b2e
rax 0xca
rbx 0x0
rcx 0x464883
rdx 0x0
rdi 0xc000058948
rsi 0x80
rbp 0x7fbfba4bdc48
rsp 0x7fbfba4bdbf0
r8 0x0
r9 0x0
r10 0x0
r11 0x286
r12 0x7fbfba4bdc68
r13 0xffffffffffffffff
r14 0xc000006d00
r15 0x2031
rip 0x464881
rflags 0x286
cs 0x33
fs 0x0
gs 0x0
In triage, we're thinking that if this is fairly easy to reproduce, how hard would it be to bisect this across the Go toolchain? That might get down to the exact reason.
In the same vein, does Go 1.19 or Go 1.20 work?
@zzkcode Thanks. As the fault is from C code, I think it is desirable to not inject a sigpanic
and return from the signal handler (like Go 1.20 did). This is intended to help debugging, because debuggers are not always able to unwind the synthetic sigpanic
frame (on all platforms, although it sometimes does like the Go 1.18 example above). By not injecting the sigpanic
frame debugger can directly unwind from the signal handler to C code.
I assume you set GOTRACEBACK=crash
in order to trigger a core file generation, therefore docrash
is true. In this case we relay SIGQUIT to all threads so they dump stack traces and registers. But that also means that all threads will receive a signal, not just the first faulting thread. I guess the core file is generated when a thread receives a signal the second time, as this time the signal is blocked and it is a fatal signal, so the kernel just kills it and generates a core. The GDB stack traces include two signal frames supports this. But it also means that it may be not the original faulting thread that dies first. In fact it may be the last one that receives the signal, as the condition crashing < mcount()-int32(extraMLength.Load())
is false for the last one, so it directly goes to crash. Unfortunately you'll need to loop over all threads to find the faulting one in GDB...
Maybe we could let the first faulting thread sleeps shorter. Maybe we could have the last thread signals the first one before crashing. So the first faulting thread will be the one crashes.
I wonder, though, why this has anything to do with ucontext
...
Hi all, this issue should have nothing to do with using ucontext in cgo but a general problem for not dumping correct stacks while crashing in cgo, maybe I do the wrong test on the wrong go version before. For the release version, I missed testing it on go1.21.0, so I did some tests again, and it turns out that this issue starts from go1.21.0. Sorry for misleading you. FYI @cherrymui @mknyszek @bcmills @prattmic @ianlancetaylor
@zzkcode Thanks. As the fault is from C code, I think it is desirable to not inject a
sigpanic
and return from the signal handler (like Go 1.20 did). This is intended to help debugging, because debuggers are not always able to unwind the syntheticsigpanic
frame (on all platforms, although it sometimes does like the Go 1.18 example above). By not injecting thesigpanic
frame debugger can directly unwind from the signal handler to C code.
@cherrymui Thanks. If I don't get this wrong, in Go1.20, it will inject a sigpanic
(my debug also proves this), while in Go1.21, it will not. In Go1.21, the flags is assigned to _SigThrow
in go1.21.0-signal_unix.go#L683, so it will just pass by go1.21.0-signal_unix.go#L690, and finally call run into docrash
block.
I assume you set
GOTRACEBACK=crash
in order to trigger a core file generation, thereforedocrash
is true. In this case we relay SIGQUIT to all threads so they dump stack traces and registers. But that also means that all threads will receive a signal, not just the first faulting thread. I guess the core file is generated when a thread receives a signal the second time, as this time the signal is blocked and it is a fatal signal, so the kernel just kills it and generates a core. The GDB stack traces include two signal frames supports this. But it also means that it may be not the original faulting thread that dies first. In fact it may be the last one that receives the signal, as the conditioncrashing < mcount()-int32(extraMLength.Load())
is false for the last one, so it directly goes to crash. Unfortunately you'll need to loop over all threads to find the faulting one in GDB...
Thanks for your explanation, now I get why the crash thread always not the number one thread in GDB.
Maybe we could let the first faulting thread sleeps shorter. Maybe we could have the last thread signals the first one before crashing. So the first faulting thread will be the one crashes.
It seems to me that without sigpanic
now all crashes in cgo may have to find out the real crash thread by themselves since go1.21.0. Do I miss something here? Looks like there is no one has reported this yet?
I wonder, though, why this has anything to do with
ucontext
...
Sorry for the misleading, this seems not related to ucontext
.
flags is set to _SigPanic
in go1.21.0:
(dlv) b src/runtime/signal_unix.go:676
Breakpoint 1 set at 0x44a6b0 for runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:676
(dlv) b src/runtime/signal_unix.go:685
Breakpoint 2 set at 0x44a721 for runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:685
//...
(dlv) c
> runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:676 (hits goroutine(1):1 total:2) (PC: 0x44a6b0)
Warning: debugging optimized function
671:
672: flags := int32(_SigThrow)
673: if sig < uint32(len(sigtable)) {
674: flags = sigtable[sig].flags
675: }
=> 676: if !c.sigFromUser() && flags&_SigPanic != 0 && (gp.throwsplit || gp != mp.curg) {
677: // We can't safely sigpanic because it may grow the
678: // stack. Abort in the signal handler instead.
679: //
680: // Also don't inject a sigpanic if we are not on a
681: // user G stack. Either we're in the runtime, or we're
(dlv) p flags
136
(dlv) c
> runtime.sighandler() /usr/local/go/src/runtime/signal_unix.go:685 (hits goroutine(1):1 total:2) (PC: 0x44a721)
Warning: debugging optimized function
680: // Also don't inject a sigpanic if we are not on a
681: // user G stack. Either we're in the runtime, or we're
682: // running C code. Either way we cannot recover.
683: flags = _SigThrow
684: }
=> 685: if isAbortPC(c.sigpc()) {
686: // On many architectures, the abort function just
687: // causes a memory fault. Don't turn that into a panic.
688: flags = _SigThrow
689: }
690: if !c.sigFromUser() && flags&_SigPanic != 0 {
(dlv) p flags
4
Please let me know if I can provide more on this.
Btw, is there a workaround(seems like not?) or will it be considered a fix? The thing is that we have a program using ucontext
which has an issue and should be fixed on https://github.com/golang/go/issues/62130, which required go1.21.0. Thanks.
Hi @cherrymui, by making the faulting thread sleep shorter, it would be able to generate a core dump with the faulting thread as the number one thread. I have verified this both in the simple reproducer and my application. Code changes are pasted below, and if you don't mind, I would love to fire a PR to fix this and backport it to the next minor release if possible. FYI @mknyszek @prattmic
This code changes may need more to do:
Code change example, should be refined to make it crash as soon as possible:
diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go
index cd9fd5d796..09aa95c2dc 100644
--- a/src/runtime/signal_unix.go
+++ b/src/runtime/signal_unix.go
@@ -752,6 +752,12 @@ func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
}
if docrash {
+ var sleepTime uint32
+ sleepTime = 5 * 1000 * 1000
+ if crashing == 0 {
+ sleepTime = 3 * 1000 * 1000
+ }
+
crashing++
if crashing < mcount()-int32(extraMLength.Load()) {
// There are other m's that need to dump their stacks.
@@ -766,8 +772,8 @@ func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) {
// 5-second sleeps have finished.
print("\n-----\n\n")
raiseproc(_SIGQUIT)
- usleep(5 * 1000 * 1000)
}
+ usleep(sleepTime)
printDebugLog()
crash()
}
Change https://go.dev/cl/536895 mentions this issue: runtime: let the fault thread to crash the process
Change https://go.dev/cl/536895 mentions this issue:
runtime: let the fault thread to crash the process
Hi @cherrymui and all, I believe this is a general issue while using cgo since go1.21.0, and I submitted a pr to fix this(just a minor change), and please consider taking a look if possible. Thanks!
@zzkcode Thanks for the CL, I'll take a look. Question: with the change, will a crashing process always take at least 3 seconds to exit?
@zzkcode Thanks for the CL, I'll take a look. Question: with the change, will a crashing process always take at least 3 seconds to exit?
Hi @cherrymui, it will not. With the change, the first thread which cause crash will always sleep for every 500ms, and check if all m receive the SIGQUIT, then crash(if yes). So it depends on how long it takes to let all m receive the SIGQUIT, I believe this has nothing big change than before since it used to let the last m crash the process.
So in normal case, it basically takes no more than 500ms to crash, maybe we can sleep like every 200ms if you have any concerns? Thanks.
Hi @cherrymui and all, thanks! Since this fix pr is ready for merging, may we consider about backporting it to the next minor release of go1.21, considering:
Or it will be release in the next minor/major release? I don't know Go team's practice on this, will you just backport the real critical issue fix? Thanks :)
@zzkcode thanks for the CL. I think it is reasonable to include this CL in Go 1.22 as a bug fix. The risk is very low, as it only affects programs that are already crashing.
I don't think this needs to be backported. We usually just backport fixes for critical issues with no workaround. This is not critical (the program is crashing either way) and has workarounds (e.g. check all threads in GDB).
Thanks.
The same issue.
@zzkcode @cherrymui Unfortunately, arm64 remains the problem:
$ gdb -nx -batch -ex bt cgo-crash /var/core/cgo-crash.1724899484.2105538.core
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `./cgo-crash'.
Program terminated with signal SIGABRT, Aborted.
#0 runtime.raise () at /opt/xxx/go/src/runtime/sys_linux_arm64.s:158
158 /opt/xxx/go/src/runtime/sys_linux_arm64.s: No such file or directory.
[Current thread is 1 (Thread 0x7f5b7fe1d0 (LWP 2105542))]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/admin/cgo-crash.
Use `info auto-load python-scripts [REGEXP]' to list them.
#0 runtime.raise () at /opt/xxx/go/src/runtime/sys_linux_arm64.s:158
#1 0x000000000044e884 in runtime.dieFromSignal (sig=6) at /opt/xxx/go/src/runtime/signal_unix.go:923
#2 0x000000000044ef30 in runtime.sigfwdgo (sig=6, info=<optimized out>, ctx=<optimized out>, ~r0=<optimized out>) at /opt/xxx/go/src/runtime/signal_unix.go:1128
#3 0x000000000044d53c in runtime.sigtrampgo (sig=0, info=0x2020c6, ctx=0x6) at /opt/xxx/go/src/runtime/signal_unix.go:432
#4 0x0000000000469a54 in runtime.sigtramp () at /opt/xxx/go/src/runtime/sys_linux_arm64.s:462
I use the same test case in https://github.com/golang/go/commit/de5b418bea70aaf27de1f47e9b5813940d1e15a4
@cherrymui It seems that https://github.com/golang/go/commit/a0c9d153e0c177677701b8a4e6e5eba5a6c44a4f this commit caused it.
I can get at least the SIGSEGV point function if I revert the commit:
admin@lambda:~$ gdb -nx -batch -ex bt cgo-crash /var/core/cgo-crash.1724910818.2258033.core
Program terminated with signal SIGABRT, Aborted.
#0 runtime.raise () at /opt/xxx/go/src/runtime/sys_linux_arm64.s:158
158 /opt/xxx/go/src/runtime/sys_linux_arm64.s: No such file or directory.
[Current thread is 1 (Thread 0x7fb4998010 (LWP 2258033))]
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/admin/cgo-crash.
Use `info auto-load python-scripts [REGEXP]' to list them.
#0 runtime.raise () at /opt/xxx/go/src/runtime/sys_linux_arm64.s:158
#1 0x000000000044cee8 in runtime.dieFromSignal (sig=6) at /opt/xxx/go/src/runtime/signal_unix.go:937
#2 0x00000000004352bc in runtime.crash () at /opt/xxx/go/src/runtime/signal_unix.go:1006
#3 runtime.fatalthrow.func1 () at /opt/xxx/go/src/runtime/panic.go:1203
#4 0x0000000000435238 in runtime.fatalthrow (t=<optimized out>) at /opt/xxx/go/src/runtime/panic.go:1192
#5 0x0000000000434e30 in runtime.throw (s=...) at /opt/xxx/go/src/runtime/panic.go:1023
#6 0x000000000044ce6c in runtime.sigpanic () at /opt/xxx/go/src/runtime/signal_unix.go:866
#7 0x000000000048d9f4 in test () at test.c:6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
It exactly shows test () at test.c:6
.
test.c
:
#include <stdio.h>
#include <stddef.h>
void test()
{
int* ptr = NULL;
*ptr = 1024;
}
void trigger_crash()
{
printf("hello world\n");
test();
}
test.h
:
#ifndef FDDE6B57_4166_4D0B_9BED_C9BF03D209B8
#define FDDE6B57_4166_4D0B_9BED_C9BF03D209B8
void trigger_crash();
#endif /* FDDE6B57_4166_4D0B_9BED_C9BF03D209B8 */
main.go
:
package main
/*
#include <test.h>
*/
import "C"
import (
"fmt"
"os"
"os/signal"
"runtime/debug"
"syscall"
)
func enableCore() {
debug.SetTraceback("crash")
var lim syscall.Rlimit
err := syscall.Getrlimit(syscall.RLIMIT_CORE, &lim)
if err != nil {
panic(fmt.Sprintf("error getting rlimit: %v", err))
}
lim.Cur = lim.Max
fmt.Fprintf(os.Stderr, "Setting RLIMIT_CORE = %+#v\n", lim)
err = syscall.Setrlimit(syscall.RLIMIT_CORE, &lim)
if err != nil {
panic(fmt.Sprintf("error setting rlimit: %v", err))
}
signal.Ignore(syscall.SIGABRT)
}
func main() {
enableCore()
C.trigger_crash()
}
Hi @miles-byted ,
It seems more like a stack-unwind issue to me. I try the test case code on amd64 and arm64, dlv show correct backtraces in both archs, but gdb does not on arm64. All tests using go1.23.0.
I'm not familiar with arm64 and stack-unwind, just a guess. And maybe you could create a new issue for this?
arm64 threads and backtraces:
(gdb) i threads
Id Target Id Frame
* 1 Thread 0xffffa6cbf010 (LWP 59900) runtime.raise ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:158
2 Thread 0xffff6004f1c0 (LWP 59901) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:138
3 Thread 0xffff5f80e1c0 (LWP 59902) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:138
4 Thread 0xffff5f00d1c0 (LWP 59903) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:138
5 Thread 0xffff5e7cc1c0 (LWP 59904) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:138
6 Thread 0xffff5dfcb1c0 (LWP 59905) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:138
(gdb) bt
#0 runtime.raise () at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:158
#1 0x000000000044cde4 in runtime.dieFromSignal (sig=6)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
#2 0x000000000044d3d0 in runtime.sigfwdgo (sig=6, info=<optimized out>,
ctx=<optimized out>, ~r0=<optimized out>)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1154
#3 0x000000000044c00c in runtime.sigtrampgo (sig=0, info=0xe9fc, ctx=0x6)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:432
#4 0x0000000000470884 in runtime.sigtramp ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:462
amd64 threads and backtrace:
(gdb) i threads
Id Target Id Frame
* 1 Thread 0x7f1fde32a740 (LWP 4132) 0x000000000046f886 in runtime.sigtramp ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:352
2 Thread 0x7f1f95800700 (LWP 4136) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:135
3 Thread 0x7f1f96c00700 (LWP 4134) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:135
4 Thread 0x7f1f94e00700 (LWP 4137) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:135
5 Thread 0x7f1f97600700 (LWP 4133) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:135
6 Thread 0x7f1f96200700 (LWP 4135) runtime.usleep ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:135
(gdb) bt
#0 runtime.raise () at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154
#1 0x000000000044ad85 in runtime.dieFromSignal (sig=6)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
#2 0x000000000044b3e6 in runtime.sigfwdgo (sig=6, info=<optimized out>, ctx=<optimized out>,
~r0=<optimized out>) at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1154
#3 0x0000000000449d85 in runtime.sigtrampgo (sig=0, info=0x0, ctx=0x46f5a1 <runtime.raise+33>)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:432
#4 0x000000000046f886 in runtime.sigtramp ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:352
#5 <signal handler called>
#6 runtime.raise () at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154
#7 0x000000000044ad85 in runtime.dieFromSignal (sig=6)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
#8 0x000000000044a906 in runtime.crash () at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1031
#9 runtime.sighandler (sig=<optimized out>, info=<optimized out>, ctxt=<optimized out>,
gp=<optimized out>) at /root/sdk/go1.23.0/src/runtime/signal_unix.go:806
#10 0x0000000000449e86 in runtime.sigtrampgo (sig=11, info=0xc00000fbf0, ctx=0xc00000fac0)
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:490
#11 0x000000000046f886 in runtime.sigtramp ()
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:352
#12 <signal handler called>
#13 0x000000000049a9d1 in trigger_crash ()
at /container_share/cs-practices/gogo/go_cgo_crash_new/main.go:9
#14 0x000000000049a9fc in _cgo_517cb600b495_Cfunc_trigger_crash (v=0xc000060f20)
at cgo-gcc-prolog:51
#15 0x000000000046da44 in runtime.asmcgocall () at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:923
#16 0x000000c0000061c0 in ?? ()
#17 0x000000000046be0a in runtime.systemstack () at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:514
#18 0x00007fffccafb0f8 in ?? ()
#19 0x000000000047041f in runtime.newproc (fn=0x46bc8f <runtime.rt0_go+303>) at <autogenerated>:1
#20 0x000000000046bd05 in runtime.mstart () at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:395
#21 0x000000000046bc8f in runtime.rt0_go () at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:358
#22 0x0000000000000001 in ?? ()
#23 0x00007fffccafb228 in ?? ()
#24 0x00007fffccafb220 in ?? ()
#25 0x0000000000000001 in ?? ()
#26 0x00007fffccafb228 in ?? ()
#27 0x00007f1fde351083 in __libc_start_main (main=0x46bb40 <main>, argc=1, argv=0x7fffccafb228,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffccafb218)
at ../csu/libc-start.c:308
#28 0x000000000040231e in _start ()
dlv on amd64:
(dlv) bt
0 0x000000000046f5a1 in runtime.raise
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:154
1 0x000000000044ad85 in runtime.dieFromSignal
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
2 0x000000000044b3e6 in runtime.sigfwdgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1154
3 0x0000000000449d85 in runtime.sigtrampgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:432
4 0x000000000046f5a1 in runtime.raise
at /root/sdk/go1.23.0/src/runtime/sys_linux_amd64.s:153
5 0x000000000044ad85 in runtime.dieFromSignal
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
6 0x000000000044a906 in runtime.crash
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1031
7 0x000000000044a906 in runtime.sighandler
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:806
8 0x0000000000449e86 in runtime.sigtrampgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:490
9 0x000000000049a9d1 in C.trigger_crash
at ./main.go:9
10 0x000000000049a9fc in C._cgo_517cb600b495_Cfunc_trigger_crash
at /tmp/go-build/cgo-gcc-prolog:51
11 0x000000000046da44 in runtime.asmcgocall
at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:923
12 0x0000000000000000 in ???
at ?:-1
13 0x0000000000462575 in runtime.cgocall
at /root/sdk/go1.23.0/src/runtime/cgocall.go:185
14 0x000000000049a75f in main._Cfunc_trigger_crash
at _cgo_gotypes.go:43
15 0x000000000049a957 in main.main
at ./main.go:38
16 0x000000000043710b in runtime.main
at /root/sdk/go1.23.0/src/runtime/proc.go:272
17 0x000000000046ddc1 in runtime.goexit
at /root/sdk/go1.23.0/src/runtime/asm_amd64.s:1700
dlv on arm64:
(dlv) bt
0 0x00000000004704d8 in runtime.raise at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:158
1 0x000000000044cde4 in runtime.dieFromSignal
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
2 0x000000000044d3d0 in runtime.sigfwdgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1154
3 0x000000000044c00c in runtime.sigtrampgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:432
4 0x00000000004704d8 in runtime.raise
at /root/sdk/go1.23.0/src/runtime/sys_linux_arm64.s:157
5 0x000000000044cde4 in runtime.dieFromSignal
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:942
6 0x000000000044cf84 in runtime.crash
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:1031
7 0x000000000044c9dc in runtime.sighandler
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:806
8 0x000000000044c0d0 in runtime.sigtrampgo
at /root/sdk/go1.23.0/src/runtime/signal_unix.go:490
9 0x00000000004b272c in C.trigger_crash
at ./main.go:9
10 0x00000000004b2754 in C._cgo_517cb600b495_Cfunc_trigger_crash
at /tmp/go-build/cgo-gcc-prolog:51
11 0x000000000046f4cc in runtime.asmcgocall
at /root/sdk/go1.23.0/src/runtime/asm_arm64.s:1000
12 0x00000000004641c8 in runtime.cgocall
at /root/sdk/go1.23.0/src/runtime/cgocall.go:185
13 0x00000000004b23b0 in main._Cfunc_trigger_crash
at _cgo_gotypes.go:43
14 0x00000000004b26d4 in main.main
at ./main.go:38
15 0x0000000000439064 in runtime.main
at /root/sdk/go1.23.0/src/runtime/proc.go:272
16 0x000000000046f6d4 in runtime.goexit
at /root/sdk/go1.23.0/src/runtime/asm_arm64.s:1223
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I'm writing a Golang+CGO program, and testing a bad code in CGO to check if it could catch the right backtrace. Different versions of Go give me different outputs, and it's working in go1.18.3, but not working on go1.21.1 and the latest gotip version(
go version devel go1.22-1176052 Thu Sep 28 03:38:07 2023 +0000 linux/amd64
).Like the backtrace from go1.18.3, I would expected it to give the right backtrace which caused the crash. I noticed that if it generates
fatal error: unexpected signal during runtime execution
and withruntime stack:
in the stdout, then it's correct. Otherwise, it is not correct. Extracting the C code would give me the right backtrace btw.What did you expect to see?
Generating the correct backtrace(details will be in the Details section).
What did you see instead?
It's not generating the correct backtrace(details will be in the Details section).
Details
go1.18.3 from the backtrace, we can see the crash is from core function which in frame 14:
std:
go1.21.1 from the backtrace, we can not tell which function cause this crash:
std:
Pure C from the backtrace, we can see the crash is caused by core function in frame 0.
Reproducer
Please compile with:
CC=clang CXX=clang++ CFLAGS="$(cflags)" go build -gcflags="all=-N -l" main.go
Environment
Updated on 2023-09-29: I think it's related to use ucontext in CGO, since if I do not bind function with ucontext, just call
core
C function directly instead of callingcore_logic
, the stack backtrace would be fine. And it may has something related to this issue https://github.com/golang/go/issues/62130.