golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.88k stars 17.52k forks source link

os: os.checkPidfd() crashes with SIGSYS #69065

Open cions opened 3 weeks ago

cions commented 3 weeks ago

Go version

go version go1.23.0 android/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/data/data/com.termux/files/home/.cache/go-build'
GOENV='/data/data/com.termux/files/home/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='android'
GOINSECURE=''
GOMODCACHE='/data/data/com.termux/files/home/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='android'
GOPATH='/data/data/com.termux/files/home/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/data/data/com.termux/files/home/goroot'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/data/data/com.termux/files/home/goroot/pkg/tool/android_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/data/data/com.termux/files/home/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/data/data/com.termux/files/usr/tmp/go-build4148197892=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Just run go env (above output is by patched version)

What did you see happen?

$ uname -r
4.9.186-perf+
$ go version
go version go1.23.0 android/arm64
$ go env
SIGSYS: bad system call
PC=0x5f3e3b7700 m=0 sigcode=1

goroutine 1 gp=0x40000021c0 m=0 mp=0x5f3ef8db20 [syscall]:
syscall.Syscall(0x1b2, 0x3733, 0x0, 0x0)
        syscall/syscall_linux.go:73 +0x20 fp=0x400009a410 sp=0x400009a3b0 pc=0x5f3e3d02b0
internal/syscall/unix.PidFDOpen(0xac?, 0x0?)
        internal/syscall/unix/pidfd_linux.go:18 +0x2c fp=0x400009a440 sp=0x400009a410 pc=0x5f3e42593c
os.checkPidfd()
        os/pidfd_linux.go:139 +0x48 fp=0x400009a4f0 sp=0x400009a440 pc=0x5f3e437c38
os.init.OnceValue[...].func2()
        sync/oncefunc.go:57 +0x74 fp=0x400009a550 sp=0x400009a4f0 pc=0x5f3e430054
sync.(*Once).doSlow(0x20?, 0x43?)
        sync/once.go:76 +0xf8 fp=0x400009a5b0 sp=0x400009a550 pc=0x5f3e3bfbf8
sync.(*Once).Do(0x5?, 0x0?)
        sync/once.go:67 +0x24 fp=0x400009a5d0 sp=0x400009a5b0 pc=0x5f3e3bfad4
os.init.OnceValue[...].func3()
        sync/oncefunc.go:62 +0x3c fp=0x400009a610 sp=0x400009a5d0 pc=0x5f3e42ff9c
os.pidfdWorks(...)
        os/pidfd_linux.go:124
os.ensurePidfd(0x0)
        os/pidfd_linux.go:23 +0x2c fp=0x400009a650 sp=0x400009a610 pc=0x5f3e43741c
os.startProcess({0x400020b9c0, 0x3f}, {0x4000214000, 0x6, 0x6}, 0x400009a870)
        os/exec_posix.go:41 +0xb8 fp=0x400009a740 sp=0x400009a650 pc=0x5f3e4322f8
os.StartProcess({0x400020b9c0, 0x3f}, {0x4000214000, 0x6, 0x6}, 0x400009a870)
        os/exec.go:319 +0x50 fp=0x400009a780 sp=0x400009a740 pc=0x5f3e431fa0
os/exec.(*Cmd).Start(0x4000216000)
        os/exec/exec.go:709 +0x4ac fp=0x400009a910 sp=0x400009a780 pc=0x5f3e46c6dc
os/exec.(*Cmd).Run(0x4000216000)
        os/exec/exec.go:607 +0x20 fp=0x400009a930 sp=0x400009a910 pc=0x5f3e46c1f0
os/exec.(*Cmd).CombinedOutput(0x4000216000)
        os/exec/exec.go:1021 +0x84 fp=0x400009a960 sp=0x400009a930 pc=0x5f3e46d9e4
cmd/go/internal/work.(*Builder).gccToolID(0x400019c000, {0x40000e81a3, 0x1b}, {0x5f3e9a43d0, 0x1})
        cmd/go/internal/work/buildid.go:235 +0x340 fp=0x400009ab90 sp=0x400009a960 pc=0x5f3e7f6b10
cmd/go/internal/work.(*Builder).gccCompilerID(0x400019c000, {0x40000e81a3, 0x1b})
        cmd/go/internal/work/exec.go:2609 +0x3a8 fp=0x400009add0 sp=0x400009ab90 pc=0x5f3e80c418
cmd/go/internal/work.(*Builder).gccSupportsFlag(0x400019c000, {0x40000a0950, 0x40000e81a3?, 0x2?}, {0x5f3e89674c, 0x16})
        cmd/go/internal/work/exec.go:2483 +0x418 fp=0x400009afd0 sp=0x400009add0 pc=0x5f3e80b6f8
cmd/go/internal/work.(*Builder).compilerCmd(0x400019c000, {0x40000a0950, 0x1, 0x1}, {0x5f3e9a1d58?, 0x1?}, {0x0, 0x0})
        cmd/go/internal/work/exec.go:2362 +0x460 fp=0x400009b070 sp=0x400009afd0 pc=0x5f3e80ac00
cmd/go/internal/work.(*Builder).GccCmd(0x400019c000, {0x5f3e9a1d58, 0x1}, {0x0, 0x0})
        cmd/go/internal/work/exec.go:2305 +0x100 fp=0x400009b0e0 sp=0x400009b070 pc=0x5f3e80a5c0
cmd/go/internal/envcmd.ExtraEnvVarsCostly()
        cmd/go/internal/envcmd/env.go:223 +0xe0 fp=0x400009b830 sp=0x400009b0e0 pc=0x5f3e82ec60
cmd/go/internal/envcmd.runEnv({0x5f3ebf02f0?, 0x5f3efb4300?}, 0x40000de1e0?, {0x40000a4030, 0x0, 0x0})
        cmd/go/internal/envcmd/env.go:335 +0x574 fp=0x400009b9b0 sp=0x400009b830 pc=0x5f3e82f7b4
main.invoke(0x5f3ef7d940, {0x40000a4030, 0x1, 0x1})
        cmd/go/main.go:299 +0x674 fp=0x400009bcc0 sp=0x400009b9b0 pc=0x5f3e87f254
main.main()
        cmd/go/main.go:213 +0xdb4 fp=0x400009bf40 sp=0x400009bcc0 pc=0x5f3e87e884
runtime.main()
        runtime/proc.go:272 +0x288 fp=0x400009bfd0 sp=0x400009bf40 pc=0x5f3e373558
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x400009bfd0 sp=0x400009bfd0 pc=0x5f3e3b2e24

goroutine 17 gp=0x400008c380 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000054790 sp=0x4000054770 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.forcegchelper()
        runtime/proc.go:337 +0xb8 fp=0x40000547d0 sp=0x4000054790 pc=0x5f3e3738b8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000547d0 sp=0x40000547d0 pc=0x5f3e3b2e24
created by runtime.init.7 in goroutine 1
        runtime/proc.go:325 +0x24

goroutine 18 gp=0x400008c540 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000054f60 sp=0x4000054f40 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.bgsweep(0x40000a2000)
        runtime/mgcsweep.go:277 +0xa0 fp=0x4000054fb0 sp=0x4000054f60 pc=0x5f3e35e050
runtime.gcenable.gowrap1()
        runtime/mgc.go:203 +0x28 fp=0x4000054fd0 sp=0x4000054fb0 pc=0x5f3e352018
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000054fd0 sp=0x4000054fd0 pc=0x5f3e3b2e24
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:203 +0x6c

goroutine 19 gp=0x400008c700 m=nil [GC scavenge wait]:
runtime.gopark(0x40000a2000?, 0x5f3e9a1c68?, 0x1?, 0x0?, 0x400008c700?)
        runtime/proc.go:424 +0xc8 fp=0x4000055760 sp=0x4000055740 pc=0x5f3e3aac08
runtime.goparkunlock(...)
        runtime/proc.go:430
runtime.(*scavengerState).park(0x5f3ef8bce0)
        runtime/mgcscavenge.go:425 +0x5c fp=0x4000055790 sp=0x4000055760 pc=0x5f3e35ba7c
runtime.bgscavenge(0x40000a2000)
        runtime/mgcscavenge.go:653 +0x44 fp=0x40000557b0 sp=0x4000055790 pc=0x5f3e35bfa4
runtime.gcenable.gowrap2()
        runtime/mgc.go:204 +0x28 fp=0x40000557d0 sp=0x40000557b0 pc=0x5f3e351fb8
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40000557d0 sp=0x40000557d0 pc=0x5f3e3b2e24
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0xac

goroutine 20 gp=0x400008c8c0 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x4000055d80 sp=0x4000055d60 pc=0x5f3e3aac08
runtime.runfinq()
        runtime/mfinal.go:193 +0x108 fp=0x4000055fd0 sp=0x4000055d80 pc=0x5f3e351118
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x4000055fd0 sp=0x4000055fd0 pc=0x5f3e3b2e24
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:163 +0x80

goroutine 33 gp=0x400013e000 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:424 +0xc8 fp=0x40001446f0 sp=0x40001446d0 pc=0x5f3e3aac08
runtime.chanrecv(0x40001260e0, 0x0, 0x1)
        runtime/chan.go:639 +0x414 fp=0x4000144770 sp=0x40001446f0 pc=0x5f3e341664
runtime.chanrecv1(0x0?, 0x0?)
        runtime/chan.go:489 +0x14 fp=0x40001447a0 sp=0x4000144770 pc=0x5f3e341214
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
        runtime/mgc.go:1732
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1735 +0x3c fp=0x40001447d0 sp=0x40001447a0 pc=0x5f3e3550fc
runtime.goexit({})
        runtime/asm_arm64.s:1223 +0x4 fp=0x40001447d0 sp=0x40001447d0 pc=0x5f3e3b2e24
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1730 +0xa0

r0      0x3733
r1      0x0
r2      0x0
r3      0x0
r4      0x0
r5      0x0
r6      0x0
r7      0x6
r8      0x1b2
r9      0x6
r10     0x0
r11     0x0
r12     0x400021a6a8
r13     0x14
r14     0x40000240c0
r15     0x4
r16     0x40000983a0
r17     0x400009a4c0
r18     0x77afbd2000
r19     0x8d6dd8b5d3e2f3f4
r20     0x400009a570
r21     0x7fcdfc11b8
r22     0x4000004000
r23     0x3cb9d69707
r24     0x8e41dc0be9fcd734
r25     0x0
r26     0x5f3ebe5578
r27     0x0
r28     0x40000021c0
r29     0x400009a348
lr      0x5f3e3d026c
sp      0x400009a350
pc      0x5f3e3b7700
fault   0x0

What did you expect to see?

According to https://go.dev/wiki/MinimumRequirements, Golang supports kernel version 2.6.32 or later, but os.checkPidfd() unconditionally calls pidfd_open(2), which was introduced in 5.3.

os.checkPidfd() should check availability without calling potentially unavailable system calls. Alternatively, allow to disable the use of pidfd by GODEBUG.

Related: #62654 CC @kolyshkin

gabyhelp commented 3 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mauri870 commented 2 weeks ago

I find it weird that this crashes the process with SIGSYS. The way pidfd_open is implemented in linux it should catch the errno (ENOSYS):

https://github.com/golang/go/blob/6885bad7dd86880be6929c02085e5c7a67ff2887/src/internal/syscall/unix/pidfd_linux.go#L18-L21

The stacktrace shows it is crashing in runtime_entersyscall. Perhaps there is a seccomp(2) filter in place causing the process to receive a SIGSYS?

ianlancetaylor commented 2 weeks ago

Yes, it would be extremely unfortunate if every unrecognized system call triggered a SIGSYS signal. That would make it impossible to write programs that run on both older and newer kernel versions. We need to understand what is causing that SIGSYS kernel. I don't think it is the kernel.

That said, I see that this is android. We may need to skip the pidfd calls on Android. CC @golang/android

mauri870 commented 2 weeks ago

Additionally, would be good to see a strace output to aid with debugging, i.e strace -f go env.

cions commented 2 weeks ago

I realized that the real culpit is seccomp.

$ grep Seccomp: /proc/self/status
Seccomp:        2
$ strace -fqq --signal=SIGSYS --trace=none go env
[pid 12717] --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x5a46509700, si_syscall=__NR_pidfd_open, si_arch=AUDIT_ARCH_AARCH64} ---

So we should check if seccomp is enabled?

ianlancetaylor commented 2 weeks ago

Checking if seccomp is enabled won't really help us, because we won't know the policy.

I think we should just skip the pidfd calls if GOOS == "android".

mauri870 commented 2 weeks ago

I think the safest approach is to disable pidfd on Android.

mauri870 commented 2 weeks ago

We can probably just make android use pidfd_other.go https://github.com/golang/go/blob/master/src/os/pidfd_other.go#L5

cions commented 2 weeks ago

Alternatively, check kernel version? https://github.com/golang/go/blob/master/src/internal/syscall/unix/kernel_version_linux.go

ianlancetaylor commented 2 weeks ago

It seems to me that the kernel version isn't going to tell us anything about the seccomp policy.

gopherbot commented 2 weeks ago

Change https://go.dev/cl/608518 mentions this issue: os: don't use pidfd functions on android

cions commented 2 weeks ago

Not to check the seccomp policy, but to check if the kernel version supports pidfd. Since Android with newer kernel would not have the problem, I don't think disabling pidfd for GOOS=android is a good idea.

ianlancetaylor commented 2 weeks ago

Is there reason to believe that the seccomp policy matches the kernel version?

In normal Linux use we don't have to check the kernel version, because the system call with fail with an ENOSYS error. In this case the above discussion suggests that it is the Android seccomp policy that is sending the SIGSYS signal. But the kernel might have been updated without updating the seccomp policy. Is there a way that we can find out?

cions commented 2 weeks ago

https://github.com/termux/termux-packages/issues/21265 Users reported go works fine on newer Android

ianlancetaylor commented 2 weeks ago

Thanks. I still don't know how to tell whether pidfd_open is supported or not. I'm certainly happy to accept a patch that has been tested on multiple versions of Android.

mauri870 commented 2 weeks ago

Any chance this was a bug with android? I found this https://github.com/aosp-mirror/platform_bionic/commit/3de19151e508e14654a2d3204d9981c514f1c93a but it points to a private issue.

cions commented 2 weeks ago

https://android-review.googlesource.com/c/platform/bionic/+/1208625 https://cs.android.com/android/_/android/platform/bionic/+/refs/tags/android-11.0.0_r1:libc/SECCOMP_WHITELIST_COMMON.TXT;l=76 pidfd_open was added to seccomp allow list since Android 11

And https://source.android.com/docs/core/architecture/kernel/generic-kernel-image#inhibits-platform-upgrades

Android 10 supports 3.18, 4.4, 4.9, 4.14, and 4.19 kernels

So, checking if the kernel version is 5.3 or newer before calling pidfd_open should work on all Android devices.

cions commented 2 weeks ago

Oops, pidfd_send_signal was not allowed in Android 11, but it fixed in 12

Since Android 11 supports only 4.19 and 5.4 kernels (https://source.android.com/docs/core/architecture/kernel/android-common), change to check against 5.5 rather than 5.3

kolyshkin commented 2 weeks ago

So, it's not the Android kernel but its seccomp policy which results in a process being killed (instead of something like returning ENOSYS). Apparently, this was fixed in Android 12, we can add a kludge to do a one-time runtime check for Android >= 12 and disable pidfd entirely if this requirement is not met, just to avoid being killed.

Alas I can't find any code that checks Android version in this repository.

kolyshkin commented 2 weeks ago

This also means that there's no CI in place to test Android 11, or this would have been caught earlier.

cions commented 2 weeks ago

You're right, we should check Android version >= 12 rather than kernel version.

Here is the way to get Android version in C (sorry I'm not familiar with Cgo) https://gist.github.com/cions/07fa5f11e38945fa96916888b7e88d0c

ianlancetaylor commented 2 weeks ago

Thanks. If we have to we can add a call to __system_property_get in runtime/cgo/gcc_android.c.

But honestly I think it would be simpler to just skip the pidfd calls on Android. Any patch to use them should be written by an Android developer who is able to test on various Android releases.

gopherbot commented 1 week ago

Change https://go.dev/cl/610515 mentions this issue: os: don't use pidfd on Android < 12