golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.8k stars 17.64k forks source link

cmd/cgo: Null pointer is accessed in C code, causing the stack to be destroyed #41070

Open fananchong opened 4 years ago

fananchong commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

YES

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/fananchong/.cache/go-build"
GOENV="/home/fananchong/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/fananchong/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build984316541=/tmp/go-build -gno-record-gcc-switches"

What did you do?

test.cpp:

#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <stdlib.h>
#include "test.h"

class A
{
public:
    void release() { free(ptr); }

private:
    void *ptr;
};

void test_crash2()
{
    A *a = (A *)0;
    a->release();
}

test.h:

#pragma once

#if __cplusplus
extern "C"
{
#endif

    extern void test_crash2();

#if __cplusplus
}
#endif

main.go:

package main

/*
#include "test.h"
#include <stdlib.h>
#cgo LDFLAGS: -L. -ltest
*/
import "C"

import (
    _ "fmt"
)

func main() {
    C.test_crash2()
    select {}
}

build cmd:

gcc -I. -g3 -O0 -c test.cpp
ar cr libtest.a test.o
go build main.go

then run env GOTRACEBACK=crash ./main

the core dump is:

(gdb) bt
#0  runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:165
#1  0x000000000044435b in runtime.dieFromSignal (sig=6) at /usr/local/go/src/runtime/signal_unix.go:729
#2  0x00000000004447ee in runtime.sigfwdgo (sig=6, info=0xc000009af0, ctx=0xc0000099c0, ~r3=<optimized out>) at /usr/local/go/src/runtime/signal_unix.go:943
#3  0x0000000000443264 in runtime.sigtrampgo (sig=6, info=0xc000009af0, ctx=0xc0000099c0) at /usr/local/go/src/runtime/signal_unix.go:412
#4  0x000000000045efc3 in runtime.sigtramp () at /usr/local/go/src/runtime/sys_linux_amd64.s:389
#5  <signal handler called>
#6  runtime.raise () at /usr/local/go/src/runtime/sys_linux_amd64.s:165
#7  0x000000000044435b in runtime.dieFromSignal (sig=6) at /usr/local/go/src/runtime/signal_unix.go:729
#8  0x00000000004444ea in runtime.crash () at /usr/local/go/src/runtime/signal_unix.go:821
#9  0x0000000000459c61 in runtime.fatalthrow.func1 () at /usr/local/go/src/runtime/panic.go:1175
#10 0x0000000000430637 in runtime.fatalthrow () at /usr/local/go/src/runtime/panic.go:1168
#11 0x0000000000430472 in runtime.throw (s=...) at /usr/local/go/src/runtime/panic.go:1116
#12 0x000000000044430a in runtime.sigpanic () at /usr/local/go/src/runtime/signal_unix.go:679
#13 0x000000000047fa06 in A::release (this=0x47fa06 <A::release()+20>) at test.cpp:33
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

What did you expect to see?

code is:

    A *a = (A *)0;
    a->release();

I expect this pointer to be 0, and the stack is not destroyed

What did you see instead?

#13 0x000000000047fa06 in A::release (this=0x47fa06 <A::release()+20>) at test.cpp:33

this pointer is 0x47fa06 and Backtrace stopped: previous frame inner to this frame (corrupt stack?)

randall77 commented 4 years ago

The traceback the runtime prints when crashing looks fine:

$ GOTRACEBACK=crash ./main 
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x47fcf6]

runtime stack:
runtime.throw(0x4b0e0c, 0x2a)
    /home/khr/go1.14.7/src/runtime/panic.go:1116 +0x72 fp=0x7fff21843310 sp=0x7fff218432e0 pc=0x430512
runtime.sigpanic()
    /home/khr/go1.14.7/src/runtime/signal_unix.go:679 +0x46a fp=0x7fff21843340 sp=0x7fff21843310 pc=0x44449a

goroutine 1 [syscall]:
runtime.cgocall(0x47f560, 0xc000053f38, 0x0)
    /home/khr/go1.14.7/src/runtime/cgocall.go:133 +0x5b fp=0xc000053f08 sp=0xc000053ed0 pc=0x40508b
main._Cfunc_test_crash2()
    _cgo_gotypes.go:43 +0x41 fp=0xc000053f38 sp=0xc000053f08 pc=0x47f511
main.main()
    /home/khr/gowork/issue41070/main.go:15 +0x20 fp=0xc000053f48 sp=0xc000053f38 pc=0x47f550
runtime.main()
    /home/khr/go1.14.7/src/runtime/proc.go:203 +0x21a fp=0xc000053fe0 sp=0xc000053f48 pc=0x432b9a
runtime.goexit()
    /home/khr/go1.14.7/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc000053fe8 sp=0xc000053fe0 pc=0x45d491

(But it of course doesn't contain the C part of the stack.)

I can reproduce the gdb problem you're seeing. When run under gdb directly, the stack trace is even weirder. I have similar, but not identical, problems with lldb trying to get backtraces.

Delve handles this just fine, although it also doesn't show the C parts:

(dlv) bt
0  0x000000000045f011 in runtime.raise
   at /home/khr/go1.14.7/src/runtime/sys_linux_amd64.s:165
1  0x000000000045cba2 in runtime.asmcgocall
   at /home/khr/go1.14.7/src/runtime/asm_amd64.s:640
2  0x00000000004050af in runtime.cgocall
   at /home/khr/go1.14.7/src/runtime/cgocall.go:143
3  0x000000000047f511 in main._Cfunc_test_crash2
   at _cgo_gotypes.go:43
4  0x000000000047f550 in main.main
   at ./main.go:15
5  0x0000000000432b9a in runtime.main
   at /home/khr/go1.14.7/src/runtime/proc.go:203
6  0x000000000045d491 in runtime.goexit
   at /home/khr/go1.14.7/src/runtime/asm_amd64.s:1373

Not sure why gdb/lldb can't backtrace the Go side. We do generate Dwarf, but maybe it's not sufficient to encode things like stack switches? Or maybe it is just wrong? @thanm

gdb behavior can vary a lot by version. I used:

$ gdb --version
GNU gdb (Debian 9.1-3) 9.1
$ lldb --version
lldb version 9.0.1
$ dlv version
Delve Debugger
Version: 1.5.0
Build: $Id: ca5318932770ca063fc9885b4764c30bfaf8a199 $
ianlancetaylor commented 4 years ago

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

I'm fairly certain that the stack has not been destroyed. It's just that gdb has gotten confused.

Which version of gdb are you using?

fananchong commented 4 years ago

gdb version:

GNU gdb (Ubuntu 9.1-0ubuntu1) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please use thread <no> to switch to the thread of A::release.

ianlancetaylor commented 4 years ago

The problem seems to be because we are calling runtime.(*sigctxt).preparePanic when the signal is raised by C code, in this case A::release. This adjusts the registers saved by the signal handler to act as though the code calls sigpanic. gdb is able to unwind past sigpanic to A::release, but it is unable to unwind any further.

The unwind info for sigpanic is:

00007324 0000000000000024 00000000 FDE cie=00000000 pc=0000000000446b80..0000000000447037
  DW_CFA_def_cfa_offset_sf: -504
  DW_CFA_advance_loc: 23 to 0000000000446b97
  DW_CFA_def_cfa_offset_sf: -464
  DW_CFA_advance_loc2: 1174 to 000000000044702d
  DW_CFA_def_cfa_offset_sf: -504
  DW_CFA_advance_loc: 9 to 0000000000447036
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop
  DW_CFA_nop

Interestingly this doesn't record any location for the saved value for rbp, and there isn't any such location in the CIE either. gdb's info frame command reports

Stack level 1, frame at 0x7fffffffe348:
 rip = 0x44702c in runtime.sigpanic
    (/home/iant/go/src/runtime/signal_unix.go:703); saved rip = 0x483ef6
 called by frame at 0x7fffffffe348, caller of frame at 0x7fffffffe318
 source language unknown.
 Arglist at 0x7fffffffe310, args: 
 Locals at 0x7fffffffe310, Previous frame's sp is 0x7fffffffe348
 Saved registers:
  rip at 0x7fffffffe340

which also does not list rbp in the list of saved registers.

I'm not sure but I think that the effect of this is that when gdb unwinds past sigpanic, it does not update rbp to the value saved on the stack. Thus when it looks at the stack for A::release, it is looking at the wrong frame location.

I spot checked the unwind info of a couple of other Go frames, and they also don't record any information about where rbp is saved. Perhaps that is a problem.

I don't plan to look into this any further myself.