golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.4k stars 17.59k forks source link

runtime: Go application segfaults when calling a symbol in a gcc compiled library #50185

Closed samohte closed 2 years ago

samohte commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14.2 aix/ppc64

Does this issue reproduce with the latest release?

yes, same problem with 1.17.3

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="ppc64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64"
GOHOSTOS="aix"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="aix"
GOPATH="/home/labuser/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/freeware/lib/golang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/freeware/lib/golang/pkg/tool/aix_ppc64"
GCCGO="gccgo"
GOPPC64="power8"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -maix64 -pthread -mcmodel=large -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build193242780=/tmp/go-build -gno-record-gcc-switches"
GOROOT/bin/go version: go version go1.14.2 aix/ppc64
GOROOT/bin/go tool compile -V: compile version go1.14.2
gdb --version: GNU gdb (GDB) 8.1.1

What did you do?

What did you expect to see?

no segfault, file handle will be leaked but no other problem expected.

What did you see instead?

go application (test) segfaults.

I did some analysis with gdb. The segfaults happens in /opt/freeware/lib/golang/src/runtime/sys_aix_ppc64.s, line 57

  >│57              MOVD    R3, (libcall_r1)(R5) 

I think this is happening: R3 is saved in line 40:

40              MOVD    R3, 48(R1) // Save libcall for later

and restored in line 57. The location where R3 was saved gets overriden when callCfunction calls close. The preloaded library libltest.so provides a symbol close which is used. This is expected. gcc creates the following code for close in libltest.so:

<close>       std     r31,-8(r1)                                                                   
<close+4>     stdu    r1,-64(r1)                                                                   
<close+8>     mr      r31,r1                                                                       
<close+12>    mr      r9,r3                                                                        
<close+16>    stw     r9,112(r31)                                                                  
<close+20>    li      r9,0                                                                         
<close+24>    mr      r3,r9                                                                        
<close+28>    addi    r1,r31,64                                                                    
<close+32>    ld      r31,-8(r1)                                                                   
<close+36>    blr                                                                                  
<close+40>    .long 0x0                                                                            
<close+44>    .long 0x2060                                                                         
<close+48>    lwz     r0,257(r1)                                                                   
<close+52>    .long 0x0                                                                            
<close+56>    .long 0x28                                                                           
<close+60>    .long 0x5636c                                                                        
<close+64>    xoris   r19,r27,25887 

The problem is caused in line

<close+16>    stw     r9,112(r31) 

The stack is increased by 64 in <close+4>, but it is accessed with an offset 112. As far as I see, gcc uses the "parameter save area" which is mentioned in the PPC specification, for example https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#STACK. According to this spec, the parameter save area shall be allocated by the caller. Therefore, the code created by gcc looks valid. Seems like the parameter save area is not allocated in the go code. testfiles.zip

cherrymui commented 2 years ago

How do you call the C symbol from Go? Could you share the source code of a complete example? Thanks.

ianlancetaylor commented 2 years ago

CC @Helflym

samohte commented 2 years ago

I do not call close in my sample. This is my sample:

$ cat go/test.go 
package main

import "fmt"
import "runtime"

func main() {
        fmt.Printf("OS: %s\nArchitecture: %s\nVersion: %s\n", runtime.GOOS, runtime.GOARCH, runtime.Version())
}

I set a break point in gdb. It looks like close is called from src/runtime/os_aix.go:243

240     func getRandomData(r []byte) {       │
241             fd := open(&urandom_dev[0], 0 /* O_RDONLY */, 0)     
242             n := read(fd, unsafe.Pointer(&r[0]), int32(len(r)))          
243             closefd(fd)          
244             extendRandom(r, int(n))       
245     }     

call stack:

(gdb) bt
#0  runtime.getRandomData (r=...) at /opt/freeware/lib/golang/src/runtime/os_aix.go:243
#1  0x00000001000368bc in runtime.fastrandinit () at /opt/freeware/lib/golang/src/runtime/proc.go:654
#2  runtime.schedinit () at /opt/freeware/lib/golang/src/runtime/proc.go:546
#3  0x0000000100060118 in runtime.rt0_go () at /opt/freeware/lib/golang/src/runtime/asm_ppc64x.s:87

/dev/urandom is opened and closed during the startup before my application code is executed. I have executed the sample program with truss -f. That's the output:

19464544: execve("./test", 0x2FF22B48, 0x20016878)       argc: 1
19464544: 21496211: kusla(6, 0x09FFFFFFF0001170)                = 0
19464544: 21496211: read_sysconfig(0x09001000A005BD08, 0x0000000000000014, 0x0000000000000079, 0x0000000000000071, 0x08FFFFFFF00000D0, 0x0000000000000091, 0x0000000000000089, 0x00000000000000B1)# = 0x0000000000000000
19464544: 21496211: checkpnt_block(0x0000000000000000, 17) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 18) = 0
19464544: 21496211: sbrk(0x0000000000000000)            = 0x000000011005A3E0
19464544: 21496211: vmgetinfo(0x0FFFFFFFFFFFF150, 7, 16)        = 0
19464544: 21496211: sbrk(0x0000000000000000)            = 0x000000011005A3E0
19464544: 21496211: __libc_sbrk(0x0000000000010020)     = 0x000000011005A3E0
19464544: 21496211: checkpnt_block(0x0000000000000000, 17) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 17) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 18) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 18) = 0
19464544: 21496211: thread_init(0x09000000005B6380, 0x09001000A011D518) = 
19464544: 21496211: sbrk(0x0000000000000000)            = 0x000000011006A400
19464544: 21496211: vmgetinfo(0x0FFFFFFFFFFFF770, 7, 16)        = 0
19464544: 21496211: smcr_procattr(0, 1, 0x0FFFFFFFFFFFF768) = 0
19464544: 21496211: getrpid(-1, -1, 648535941212897640) = 19464544
19464544: 21496211: _getpid()                           = 19464544
19464544: 21496211: getprocs64(0x000000011005F1F0, 5024, 0x0000000000000000, 0, 0x09001000A0118098, 1) = 1
19464544: 21496211: appulimit(1005, 0)                  = 0x0FFFFFFFFE000000
19464544: 21496211: _thread_self()                      = 21496211
19464544: 21496211: thread_setmystate(0x0000000000000000, 0x0FFFFFFFFFFFF2A0) = 0
19464544: 21496211: thread_setmystate(0x0FFFFFFFFFFFEEC0, 0x0FFFFFFFFFFFF248) = 0
19464544: 21496211: _sigaction(3, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(4, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(5, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(6, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(7, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(8, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(10, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(11, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(12, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(36, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: _sigaction(39, 0x0FFFFFFFFFFFF660, 0x0FFFFFFFFFFFF690) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 17) = 0
19464544: 21496211: kloadquery(238, 0x0FFFFFFFFFFFF6A0, 24) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 18) = 0
19464544: 21496211: checkpnt_block(0x0000000000000000, 18) = 0
19464544: 21496211: corral_getcid()                     = 0
19464544: 21496211: sysconfig(8, 0x0FFFFFFFFFFFF580, 200)       = 0
19464544: 21496211: mmap(0x0000000000000000, 262144, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000000000000
19464544: 21496211: mmap(0x0000000000000000, 131072, PROT_NONE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000000040000
19464544: 21496211: mmap(0x0000000000000000, 1048576, PROT_NONE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000000060000
19464544: 21496211: mmap(0x0000000000000000, 8388608, PROT_NONE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000000160000
19464544: 21496211: mmap(0x0000000000000000, 67108864, PROT_NONE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000000960000
19464544: 21496211: mmap(0x0000000000000000, 536870912, PROT_NONE, MAP_ANONYMOUS|MAP_VARIABLE|MAP_PRIVATE, -1, 0) = 0x0A00000004960000
19464544: 21496211: kopen("/dev/urandom", O_RDONLY)     = 3
19464544: 21496211: kread(3, " = � �\n � y86 {", 8)     = 8
19464544:     Received signal #11, SIGSEGV [default]
19464544: *** process killed ***

If I execute the program without the library preloaded, I see close(3) after kread(3, "

Helflym commented 2 years ago

Indeed, asmsyscall6 must stored its first local variable at offset 112 instead of 48. I'll fix that. Thanks for the report.

gopherbot commented 2 years ago

Change https://golang.org/cl/373074 mentions this issue: runtime: ensure that asmsyscall6 follow AIX stack convention

samohte commented 2 years ago

Thanks for your quick response. I tried the fix from the linked review and can confirm that it solves the problem with my reproducer.