golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.86k stars 17.52k forks source link

runtime: Error related to `runtime.open_trampoline` in darwin/arm64 #48437

Open richard-ramos opened 2 years ago

richard-ramos commented 2 years ago

What version of Go are you using (go version)?

$ go version
`go version go1.17.1 darwin/arm64`

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/richard/Library/Caches/go-build"
GOENV="/Users/richard/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/richard/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/richard/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.17.1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0k/5ql7kl3179xbjstrcssd1pfc0000gn/T/go-build2059164481=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

What did you expect to see?

Expected to see the result of https://github.com/status-im/status-go/blob/0c0e02e93af31207fedb04f98ae6161cd4bcb3df/services/ext/api.go#L548-L550

What did you see instead?

Application crashes. When running the app with lldb, the following error is seen:

Process 5688 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001022d9ca4 libstatus.dylibnotok + 4
libstatus.dylibnotok:
->  0x1022d9ca4 <+4>:  str    x8, [x8]
    0x1022d9ca8 <+8>:  b      0x1022d9ca8               ; <+8>
    0x1022d9cac <+12>: udf    #0x0
libstatus.dylib`runtime.open_trampoline:
    0x1022d9cb0 <+0>:  str    x30, [sp, #-0x10]!
Target 0: (nim_status_client) stopped.
dr2chase commented 2 years ago

@cherrymui

cherrymui commented 2 years ago

The fault doesn't happen in open_trampoline but in notok ( https://cs.opensource.google/go/go/+/master:src/runtime/sys_darwin_arm64.s;l=17 ) It intentionally faults when something bad happens when it shouldn't. The only places this is called are munmap fails, sigprocmask fails, sigaction fails, or sigaltstack fails. Could you get a stack trace to see how it got there? Thanks.

richard-ramos commented 2 years ago

Hello, @cherrymui, Here's what I got after running thread backtrace 1

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001022d9c44 libstatus.dylib`notok + 4
    frame #1: 0x00000001022da270 libstatus.dylib`runtime.sigaltstack_trampoline + 32
    frame #2: 0x00000001022d9118 libstatus.dylib`runtime.asmcgocall + 200
    frame #3: 0x00000001022d9118 libstatus.dylib`runtime.asmcgocall + 200
    frame #4: 0x00000001022d9118 libstatus.dylib`runtime.asmcgocall + 200

Thank you!

cherrymui commented 2 years ago

This looks like sigaltstack fails, which is weird. Does the C (or other language) part of your binary (which links against the Go library) does anything weird about the signal stack? Could you provide more information about how you run the binary? Thanks.

richard-ramos commented 2 years ago

Does the C (or other language) part of your binary (which links against the Go library) does anything weird about the signal stack?

I use nim which translates to C. How can I identify if something is done with the signal stack?

fzwoch commented 2 years ago

Seeing similar here: https://github.com/fzwoch/obs-teleport/issues/43

Happening with sigalstack as well as well as open.

A little context: OBS Studio is a C library with a C++ UI (Qt). It loads plugins via C interface and plugins can hook back into their system via a C interface too. So the plugin is build with -buildmode=c-shared.

This has been working fine for windows/x86_64, linux/x86_64, darwin/x86_64. OBS Studio recently release their first darwin/arm64 port where I noticed this issue.

cherrymui commented 2 years ago

Is the c-shared library used by a C/C++ program, or a Go program? Is it the only Go part in that program? Thanks.

fzwoch commented 2 years ago

The shared library is used by a C/C++ program. And yes, it is the only part in that system that is written in Go.

The application can be downloaded here: https://github.com/obsproject/obs-studio/releases/tag/28.0.0-beta1

For the shared library to be loaded it can be placed in this location: ~/Library/Application\ Support/obs-studio/plugins/[shared_lib]/bin/[shared_lib].so

With [shared_lib] being something of your choice. The naming convention is important so that it gets loaded on startup. It also requires the .so extension instead of .dylib - I think that is due to some historic reason.

Opon running the application it freezes/hangs. When running from lldb one can see the Go runtime crash. Running with a debugger may require changing the app's entitlements though.

Alternatively that project can be build wiht its CI/build-macos.sh script.

mknyszek commented 2 years ago

@cherrymui Assigning to you for now since you last replied but please feel free to unassign!

fzwoch commented 1 year ago

I tried again with most recent code from OBS. This time it crashes at

runtime.asmcgocall.abi0

->  0x161f8d7e8 <+200>: ldr    x2, [sp, #0x8]

(well it calls notok again, with the final crash):

->  0x161f8e384 <+4>:  str    x8, [x8]

Is there any special memory layout required for Go that the C++/Qt framework may break?

Interestingly enough, when I enable the Address Sanitizer from within Xcode and let the same code run it behaves perfectly fine. The Address Sanitizer also does not complain about anything being wrong or odd.

fzwoch commented 1 year ago

Looking at it again, I will try to provide a complete repro case:

Using the following code:

package main

//#include <stdbool.h>
//#include <stdint.h>
import "C"
import (
    "unsafe"
)

var obsModulePointer unsafe.Pointer

//export obs_module_set_pointer
func obs_module_set_pointer(module unsafe.Pointer) {
    obsModulePointer = module
}

//export obs_current_module
func obs_current_module() unsafe.Pointer {
    return obsModulePointer
}

//export obs_module_ver
func obs_module_ver() C.uint32_t {
    return 0
}

//export obs_module_load
func obs_module_load() C.bool {
    return true
}

func main() {}

Compiled with (Go 1.19.3, but also current HEAD):

go build -buildmode=c-shared -o obs-test

The result is being wrapped in a small plugin structure:

obs-test.plugin
obs-test.plugin/Contents
obs-test.plugin/Contents/MacOS
obs-test.plugin/Contents/MacOS/obs-test
obs-test.plugin/Contents/Info.plist

With obs-test being the shared library and Info.plist containing:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>CFBundleExecutable</key>
    <string>obs-test</string>
</dict>
</plist>

This plugin structure is then installed at

~/Library/Application\ Support/obs-studio/plugins/obs-teleport.plugin

Running OBS 29.0.0 Beta 2 which is pre-compiled.

To allow to attach LLDB to the application I needed to do:

codesign -s - -f --entitlements debuggee-entitlement.xml /Applications/OBS.app

with debuggee-entitlement.xml containing the following:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>com.apple.security.get-task-allow</key>
    <true/>
</dict>
</plist>

Running OBS with the debugger will show the crash on startup.

If I have followed the code path correctly the OBS app successfully opens the library and loads the required symbols from it. The next thing what is happening is that it calls the library's obs_module_set_pointer() function to set some C memory pointer - but it will crash before reaching the function's body. If I add a func init() { fmt.Println("Init") } to the shared library I can see that this part is successfully run.

lldb /Applications/OBS.app/Contents/MacOS/OBS

The crash trace:

Process 1119 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000013f5240b4 obs-test`notok + 4
obs-test`notok:
->  0x13f5240b4 <+4>:  str    x8, [x8]
    0x13f5240b8 <+8>:  b      0x13f5240b8               ; <+8>
    0x13f5240bc <+12>: udf    #0x0

obs-test`runtime.open_trampoline.abi0:
    0x13f5240c0 <+0>:  str    x30, [sp, #-0x10]!
Target 0: (OBS) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000013f5240b4 obs-test`notok + 4
    frame #1: 0x000000013f524660 obs-test`runtime.sigaltstack_trampoline.abi0 + 32
    frame #2: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
(lldb) up
frame #1: 0x000000013f524660 obs-test`runtime.sigaltstack_trampoline.abi0 + 32
obs-test`runtime.sigaltstack_trampoline.abi0:
->  0x13f524660 <+32>: ldp    x29, x30, [sp, #-0x8]
    0x13f524664 <+36>: add    sp, sp, #0x10
    0x13f524668 <+40>: ret    
    0x13f52466c <+44>: udf    #0x0
(lldb) up
frame #2: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
obs-test`runtime.asmcgocall.abi0:
->  0x13f523538 <+200>: ldr    x2, [sp, #0x8]
    0x13f52353c <+204>: mov    sp, x2
    0x13f523540 <+208>: str    x0, [sp, #0x28]
    0x13f523544 <+212>: ldp    x29, x30, [sp, #-0x8]
(lldb) up
frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
obs-test`runtime.asmcgocall.abi0:
->  0x13f523538 <+200>: ldr    x2, [sp, #0x8]
    0x13f52353c <+204>: mov    sp, x2
    0x13f523540 <+208>: str    x0, [sp, #0x28]
    0x13f523544 <+212>: ldp    x29, x30, [sp, #-0x8]
(lldb) up
frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
obs-test`runtime.asmcgocall.abi0:
->  0x13f523538 <+200>: ldr    x2, [sp, #0x8]
    0x13f52353c <+204>: mov    sp, x2
    0x13f523540 <+208>: str    x0, [sp, #0x28]
    0x13f523544 <+212>: ldp    x29, x30, [sp, #-0x8]

Callstack of all threads at this point:

(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000013f5240b4 obs-test`notok + 4
    frame #1: 0x000000013f524660 obs-test`runtime.sigaltstack_trampoline.abi0 + 32
    frame #2: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  * frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #2
    frame #0: 0x000000019c68072c libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #3
    frame #0: 0x000000019c68072c libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #4
    frame #0: 0x000000019c68072c libsystem_kernel.dylib`__workq_kernreturn + 8
  thread #5, name = 'libobs: hotkey thread'
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x0000000101638d40 libobs`os_event_timedwait + 128
    frame #3: 0x00000001015c4284 libobs`obs_hotkey_thread + 160
    frame #4: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #6, name = 'tiny_tubular_task_thread'
    frame #0: 0x000000019c67e8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000101638fc4 libobs`os_sem_wait + 20
    frame #2: 0x0000000101633f30 libobs`tiny_tubular_task_thread + 220
    frame #3: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #7
    frame #0: 0x0000000000000000
  thread #8, name = 'audio-io: audio thread'
    frame #0: 0x000000019c68206c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x000000019c58afc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000019c58aee0 libsystem_c.dylib`usleep + 68
    frame #3: 0x00000001016383e4 libobs`os_sleepto_ns_fast + 100
    frame #4: 0x000000010162205c libobs`audio_thread + 1000
    frame #5: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #9, name = 'video-io: video thread'
    frame #0: 0x000000019c67e8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000101638fc4 libobs`os_sem_wait + 20
    frame #2: 0x0000000101624fd0 libobs`video_thread + 84
    frame #3: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #10, name = 'libobs: graphics thread'
    frame #0: 0x000000019c68206c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x000000019c58afc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000010163835c libobs`os_sleepto_ns + 124
    frame #3: 0x00000001015fc458 libobs`obs_graphics_thread_loop + 5352
    frame #4: 0x0000000101637188 libobs`obs_graphics_thread_loop_autorelease + 32
    frame #5: 0x00000001015fc6e0 libobs`obs_graphics_thread + 168
    frame #6: 0x000000010163714c libobs`obs_graphics_thread_autorelease + 32
    frame #7: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #11, name = 'scripting: defer'
    frame #0: 0x000000019c67e8ec libsystem_kernel.dylib`semaphore_wait_trap + 8
    frame #1: 0x0000000101638fc4 libobs`os_sem_wait + 20
    frame #2: 0x000000010fdfe84c libobs-scripting.29.dylib`defer_thread + 56
    frame #3: 0x000000019c6bc26c libsystem_pthread.dylib`_pthread_start + 148
  thread #12
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x000000013f524878 obs-test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #5: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #13
    frame #0: 0x000000019c68206c libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x000000019c58afc8 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000019c58aee0 libsystem_c.dylib`usleep + 68
    frame #3: 0x000000013f524514 obs-test`runtime.usleep_trampoline.abi0 + 20
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #5: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #14
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x000000013f524878 obs-test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #5: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #15
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x000000013f524878 obs-test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #16
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x000000013f524878 obs-test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #5: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
  thread #17
    frame #0: 0x000000019c682270 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x000000019c6bc83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame #2: 0x000000013f524878 obs-test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #4: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200
    frame #5: 0x000000013f523538 obs-test`runtime.asmcgocall.abi0 + 200

I tried to reduce the number of plugins that are being loaded without significant difference. I checked the code for its own use of sigaltstack, but could not find anything here, so I think it is Go's internal call to change the stack (?).

I there anything else I can provide or try to isolate the issue? For me it seems to happen on a level where my expertise is quite weak. Ideally we find a fix, but knowing why it goes wrong and understanding why this may be a technical limitation would help ease my mind.

fzwoch commented 1 year ago

Doing some dtruss work for sigaltstack

 4701/0x5fa3:  sigaltstack(0x0, 0x177B56EF0, 0x0)        = 0 0
 4701/0x5fa3:  sigaltstack(0x177B56EB0, 0x0, 0x0)        = 0 0
 4701/0x5fa4:  fork()        = 0 0
 4701/0x5fa4:  sigaltstack(0x0, 0x177BE6EA0, 0x0)        = 0 0
 4701/0x5fa5:  fork()        = 0 0
 4701/0x5fa5:  sigaltstack(0x0, 0x177C76EA0, 0x0)        = 0 0
 4701/0x5fa5:  sigaltstack(0x177C76E60, 0x0, 0x0)        = 0 0
 4701/0x5fa4:  sigaltstack(0x177BE6E60, 0x0, 0x0)        = 0 0
 4701/0x5fa6:  fork()        = 0 0
 4701/0x5fa6:  sigaltstack(0x0, 0x177D0AEA0, 0x0)        = 0 0
 4701/0x5fa6:  sigaltstack(0x177D0AE60, 0x0, 0x0)        = 0 0
 4701/0x5fa7:  fork()        = 0 0
 4701/0x5fa7:  sigaltstack(0x0, 0x177D9AEA0, 0x0)        = 0 0
 4701/0x5fa7:  sigaltstack(0x177D9AE60, 0x0, 0x0)        = 0 0
 4701/0x5fa8:  fork()        = 0 0
 4701/0x5fa8:  sigaltstack(0x0, 0x177E2AEA0, 0x0)        = 0 0
 4701/0x5f73:  sigaltstack(0x0, 0x16D96A1D0, 0x0)        = 0 0
 4701/0x5f73:  sigaltstack(0x16D96A190, 0x0, 0x0)        = -1 Err#1
 4701/0x5fa8:  sigaltstack(0x177E2AE60, 0x0, 0x0)        = 0 0

sigaltstack seems to fail with errno 1 which seems to be EPERM.

[EPERM] An attempt was made to modify an active stack.

or from the Linux manpage:

EPERM An attempt was made to change the alternate signal stack while it was active (i.e., the thread was already executing on the current alternate signal stack).

fzwoch commented 1 year ago

Small heads-up. I figured my issue seems not to be connected to Go. At some point in OBS simply calling sigaltstack() seems to fail with EPERM. The reason to that are a mystery to me though.

cherrymui commented 1 year ago

Thanks for the investigation. I think the Go runtime only installs sigaltstack if it is not already installed. So if a sigaltstack is active, Go wouldn't change it. Yeah, it is weird...