While attempting to enable frame pointer unwinding for execution tracing on arm64, I ran the TestTraceUnwindCGO test. This test crashes on arm64 during frame pointer unwinding. The test, unaltered, does not crash on amd64. However, I think amd64 is susceptible to the same issue, namely that there's a trace event during a C-to-Go call which currently gets a call stack via frame pointer unwinding when it probably shouldn't.
We have the following logic in place to determine whether we're in a C-to-Go call:
If so, we fall back to regular unwinding, to avoid possibly following bad frame pointers coming from the C code. However, there is a point during the C-to-Go call where the "incgocallback" check is false, because incgo == true, yet we will still emit a trace event, during reentersyscall:
I can make TestTraceUnwindCGO crash on amd64 by deliberately putting junk in RBP and confirming that the tracer attempts to unwind through the junk address:
Then go build ./testdata/testprogcgo, then gdb ./testprogcgo:
(gdb) r Trace
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/ec2-user/repos/go/src/runtime/testprogcgo Trace
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffd03e2700 (LWP 18813)]
[New Thread 0x7fffcf9e1700 (LWP 18814)]
[New Thread 0x7fffcefe0700 (LWP 18815)]
[New Thread 0x7fffce5df700 (LWP 18816)]
[New Thread 0x7fffcdb9e700 (LWP 18817)]
[New Thread 0x7fffcd19d700 (LWP 18818)]
Thread 1 "testprogcgo" received signal SIGSEGV, Segmentation fault.
runtime.traceStackID (mp=<optimized out>, pcBuf=[]uintptr = {...}, skip=<optimized out>, ~r0=<optimized out>) at /home/ec2-user/repos/go/src/runtime/trace.go:916
916 nstk += 1 + fpTracebackPCs(unsafe.Pointer(curgp.sched.bp), skip, pcBuf[2:])
(gdb) bt
#0 runtime.traceStackID (mp=<optimized out>, pcBuf=[]uintptr = {...}, skip=<optimized out>, ~r0=<optimized out>) at /home/ec2-user/repos/go/src/runtime/trace.go:916
#1 0x000000000045a865 in runtime.traceEventLocked (extraBytes=<optimized out>, mp=0x85f0e0 <runtime.m0>, pid=<optimized out>, bufp=0xc00002bbd0, ev=28 '\034', stackID=0, skip=1,
args=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /home/ec2-user/repos/go/src/runtime/trace.go:759
#2 0x000000000045a569 in runtime.traceEvent (ev=28 '\034', skip=1, args=[]uint64) at /home/ec2-user/repos/go/src/runtime/trace.go:691
#3 0x000000000045d225 in runtime.traceGoSysCall () at /home/ec2-user/repos/go/src/runtime/trace.go:1573
#4 0x000000000046a42a in runtime.systemstack () at /home/ec2-user/repos/go/src/runtime/asm_amd64.s:509
#5 0x00007fffffffe268 in ?? ()
#6 0x000000000085f0e0 in ?? ()
#7 0x00007fffffffe250 in ?? ()
#8 0x00000000004ae39d in crosscall2 () at /home/ec2-user/repos/go/src/runtime/cgo/asm_amd64.s:30
#9 0x000000000051780e in goCalledFromC () at _cgo_export.c:212
(gdb) disas
... elided ...
=> 0x000000000045b297 <+535>: mov 0x8(%rsi),%r11
0x000000000045b29b <+539>: mov %r11,(%r9,%rax,8)
0x000000000045b29f <+543>: mov (%rsi),%rsi
0x000000000045b2a2 <+546>: inc %rax
0x000000000045b2a5 <+549>: cmp %r10,%rax
(gdb) p/x $rsi
$2 = 0xdeadbeef
I think we need something like an incgocallback boolean that's set to be true for the entirety of cgocallbackg, or at least something that covers the reentersyscall call at the end of cgocallbackg.
While attempting to enable frame pointer unwinding for execution tracing on arm64, I ran the
TestTraceUnwindCGO
test. This test crashes on arm64 during frame pointer unwinding. The test, unaltered, does not crash on amd64. However, I think amd64 is susceptible to the same issue, namely that there's a trace event during a C-to-Go call which currently gets a call stack via frame pointer unwinding when it probably shouldn't.We have the following logic in place to determine whether we're in a C-to-Go call:
https://github.com/golang/go/blob/715d53c090ea02dbd73c301684ecbd09b476989e/src/runtime/proc.go#L860-L862
If so, we fall back to regular unwinding, to avoid possibly following bad frame pointers coming from the C code. However, there is a point during the C-to-Go call where the "incgocallback" check is false, because
incgo == true
, yet we will still emit a trace event, duringreentersyscall
:https://github.com/golang/go/blob/715d53c090ea02dbd73c301684ecbd09b476989e/src/runtime/cgocall.go#L241-L250
I can make
TestTraceUnwindCGO
crash on amd64 by deliberately putting junk in RBP and confirming that the tracer attempts to unwind through the junk address:Then
go build ./testdata/testprogcgo
, thengdb ./testprogcgo
:I think we need something like an
incgocallback
boolean that's set to be true for the entirety ofcgocallbackg
, or at least something that covers thereentersyscall
call at the end ofcgocallbackg
.cc @felixge @mknyszek @prattmic