golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.99k stars 17.67k forks source link

runtime: crashes on linux/riscv64 during runtime/pprof and os/signal tests #49709

Closed rhysh closed 2 years ago

rhysh commented 2 years ago

What version of Go are you using (go version)?

$ go version
go version go1.17.3 darwin/amd64

(Cross-compiling to linux/riscv64)

Does this issue reproduce with the latest release?

Yes, this problem appears with go1.17.3, the latest stable release.

What operating system and processor architecture are you using (go env)?

I only have a working Go installation on my darwin/amd64 machine (below). I'm cross-compiling from there to linux/riscv64, where I see the problem.

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/rhysh/Library/Caches/go-build"
GOENV="/Users/rhysh/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/rhysh/work/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/rhysh/work"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/pw/d_qmtcrd3vs0890gvmrq8qx80000gn/T/go-build338083123=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I've got a small RISCV64 computer: Nezha with the Allwinner D1 SOC. I'm running the runtime/pprof tests at version go1.17.3 and seeing several types of "fatal error" failures.

$ GOOS=linux GOARCH=riscv64 go test -c -o /tmp/pprof.go1.17.3.test runtime/pprof
$ GOOS=linux GOARCH=riscv64 go build -o /tmp/stress golang.org/x/tools/cmd/stress
$ go version /tmp/pprof.go1.17.3.test 
/tmp/pprof.go1.17.3.test: go1.17.3
$ # scp and ssh to the linux/riscv64 host

sipeed@sipeed:~$ uname -a
Linux sipeed 5.4.61 #15 PREEMPT Thu Jun 10 06:50:03 UTC 2021 riscv64 GNU/Linux
sipeed@sipeed:~$ nohup ./stress ./pprof.go1.17.3.test -test.run=TestCPUProfileMultithreaded -test.short=true &

What did you expect to see?

Near-zero failure rate of runtime/pprof's TestCPUProfileMultithreaded test in short mode when using go1.17.3, and zero crashes.

What did you see instead?

Currently about 1.5% failure rate, all of which are crashes ("fatal error"). (On the plus side, I haven't seen any test failures for that test.)

In the last 42 minutes, the stress run has collected:

Here's an example of each:

"fatal error: unexpected signal during runtime execution" / "[signal SIGSEGV: segmentation violation code=0x1 addr=0x2 pc=0x2]" ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211121T161735-4177513427 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x2 pc=0x2] goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fb005b730, fp:0x3fb005b760} stack=[0x3fb0054000,0x3fb005c000) 0x0000003fb005b630: 0x0000000000000000 0x000000000004cb6c 0x0000003fb005b640: 0x000000000004c8c0 0x0000003fb005b708 0x0000003fb005b650: 0x0000000000000000 0x0000003fb0000ea0 0x0000003fb005b660: 0x010000000004cea4 0x0000000000000004 0x0000003fb005b670: 0x000000000000001f 0x0000000000000002 0x0000003fb005b680: 0x0000000000000002 0x0000000000000001 0x0000003fb005b690: 0x00000000001ae451 0x000000000004cafc 0x0000003fb005b6a0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6b0: 0x0000003fb005b708 0x000000000004e4d8 0x0000003fb005b6c0: 0x0000003fb005b708 0x000000000004c8c0 0x0000003fb005b6d0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6e0: 0x0000003fb005b6e8 0x000000000004cb10 0x0000003fb005b6f0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b700: 0x0000003fb005b708 0x0000000000066a3c 0x0000003fb005b710: 0x0000003fb005b718 0x000000000004c8d8 0x0000003fb005b720: 0x00000000001b170e 0x000000000000002a 0x0000003fb005b730: <0x0000000000000002 0x00000000001b170e 0x0000003fb005b740: 0x000000000000002a 0x0000000000000080 0x0000003fb005b750: 0x0000000000000003 0x0000000000000000 0x0000003fb005b760: >0x0000000000000003 0x00000000000658a4 0x0000003fb005b770: 0x0000000000158554 0x00000000002a7328 0x0000003fb005b780: 0x00000000002b6f80 0x0000000000000000 0x0000003fb005b790: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7a0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7b0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7c0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7d0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7e0: 0x0000000000065ef8 0x0000003fb00ba820 0x0000003fb005b7f0: 0x0000000000158554 0x0000003fb002ef58 0x0000003fb005b800: 0x00000000001584d4 0x0000000000065f98 0x0000003fb005b810: 0x000000000006605c 0x0000000000280638 0x0000003fb005b820: 0x00000000002b6f80 0x0000000000065ad4 0x0000003fb005b830: 0x0000003fb00ba820 0x0000003fb005b8c8 0x0000003fb005b840: 0x00000000001584d4 0x0000003fb00ba820 0x0000003fb005b850: 0x0000003fb0030800 0x0000000000000000 runtime.throw({0x1b170e, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fb005b730, fp:0x3fb005b760} stack=[0x3fb0054000,0x3fb005c000) 0x0000003fb005b630: 0x0000000000000000 0x000000000004cb6c 0x0000003fb005b640: 0x000000000004c8c0 0x0000003fb005b708 0x0000003fb005b650: 0x0000000000000000 0x0000003fb0000ea0 0x0000003fb005b660: 0x010000000004cea4 0x0000000000000004 0x0000003fb005b670: 0x000000000000001f 0x0000000000000002 0x0000003fb005b680: 0x0000000000000002 0x0000000000000001 0x0000003fb005b690: 0x00000000001ae451 0x000000000004cafc 0x0000003fb005b6a0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6b0: 0x0000003fb005b708 0x000000000004e4d8 0x0000003fb005b6c0: 0x0000003fb005b708 0x000000000004c8c0 0x0000003fb005b6d0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6e0: 0x0000003fb005b6e8 0x000000000004cb10 0x0000003fb005b6f0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b700: 0x0000003fb005b708 0x0000000000066a3c 0x0000003fb005b710: 0x0000003fb005b718 0x000000000004c8d8 0x0000003fb005b720: 0x00000000001b170e 0x000000000000002a 0x0000003fb005b730: <0x0000000000000002 0x00000000001b170e 0x0000003fb005b740: 0x000000000000002a 0x0000000000000080 0x0000003fb005b750: 0x0000000000000003 0x0000000000000000 0x0000003fb005b760: >0x0000000000000003 0x00000000000658a4 0x0000003fb005b770: 0x0000000000158554 0x00000000002a7328 0x0000003fb005b780: 0x00000000002b6f80 0x0000000000000000 0x0000003fb005b790: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7a0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7b0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7c0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7d0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7e0: 0x0000000000065ef8 0x0000003fb00ba820 0x0000003fb005b7f0: 0x0000000000158554 0x0000003fb002ef58 0x0000003fb005b800: 0x00000000001584d4 0x0000000000065f98 0x0000003fb005b810: 0x000000000006605c 0x0000000000280638 0x0000003fb005b820: 0x00000000002b6f80 0x0000000000065ad4 0x0000003fb005b830: 0x0000003fb00ba820 0x0000003fb005b8c8 0x0000003fb005b840: 0x00000000001584d4 0x0000003fb00ba820 0x0000003fb005b850: 0x0000003fb0030800 0x0000000000000000 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:719 +0x29c goroutine 8 [running]: fatal error: unexpected signal during runtime execution panic during panic [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x767d0] goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fb005b730, fp:0x3fb005b760} stack=[0x3fb0054000,0x3fb005c000) 0x0000003fb005b630: 0x0000000000000000 0x000000000004cb6c 0x0000003fb005b640: 0x0000003fb0000ea0 0x0000003fb005b708 0x0000003fb005b650: 0x0000000000000000 0x0000003fb0000ea0 0x0000003fb005b660: 0x010000000004cea4 0x0000000000000004 0x0000003fb005b670: 0x000000000000001f 0x0000000000000002 0x0000003fb005b680: 0x0000000000000002 0x0000000000000001 0x0000003fb005b690: 0x00000000001ae451 0x000000000004cafc 0x0000003fb005b6a0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6b0: 0x0000003fb005b708 0x000000000004e4d8 0x0000003fb005b6c0: 0x0000003fb005b708 0x000000000004c8c0 0x0000003fb005b6d0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6e0: 0x0000003fb005b6e8 0x000000000004cb10 0x0000003fb005b6f0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b700: 0x0000003fb005b708 0x0000000000066a3c 0x0000003fb005b710: 0x0000003fb005b718 0x000000000004c8d8 0x0000003fb005b720: 0x00000000001b170e 0x000000000000002a 0x0000003fb005b730: <0x0000000000000002 0x00000000001b170e 0x0000003fb005b740: 0x000000000000002a 0x0000000000000080 0x0000003fb005b750: 0x0000000000000003 0x0000000000000000 0x0000003fb005b760: >0x0000000000000003 0x00000000000658a4 0x0000003fb005b770: 0x0000000000158554 0x00000000002a7328 0x0000003fb005b780: 0x00000000002b6f80 0x0000000000000000 0x0000003fb005b790: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7a0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7b0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7c0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7d0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7e0: 0x0000000000065ef8 0x0000003fb00ba820 0x0000003fb005b7f0: 0x0000000000158554 0x0000003fb002ef58 0x0000003fb005b800: 0x00000000001584d4 0x0000000000065f98 0x0000003fb005b810: 0x000000000006605c 0x0000000000280638 0x0000003fb005b820: 0x00000000002b6f80 0x0000000000065ad4 0x0000003fb005b830: 0x0000003fb00ba820 0x0000003fb005b8c8 0x0000003fb005b840: 0x00000000001584d4 0x0000003fb00ba820 0x0000003fb005b850: 0x0000003fb0030800 0x0000000000000000 runtime.throw({0x1b170e, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:719 +0x29c runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x3fb00ba820, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0) /usr/local/go/src/runtime/traceback.go:247 +0x4d8 runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x3fb00ba820, 0x0) /usr/local/go/src/runtime/traceback.go:823 +0x14c runtime.traceback(...) /usr/local/go/src/runtime/traceback.go:777 runtime.tracebackothers(0x3fb0000ea0) /usr/local/go/src/runtime/traceback.go:1015 +0xcc runtime.dopanic_m(0x3fb0000ea0, 0x4c8c0, 0x3fb005b708) /usr/local/go/src/runtime/panic.go:1398 +0x350 runtime.fatalthrow.func1() /usr/local/go/src/runtime/panic.go:1253 +0x5c runtime.fatalthrow() /usr/local/go/src/runtime/panic.go:1250 +0x4c runtime.throw({0x1b170e, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fb005b730, fp:0x3fb005b760} stack=[0x3fb0054000,0x3fb005c000) 0x0000003fb005b630: 0x0000000000000000 0x000000000004cb6c 0x0000003fb005b640: 0x0000003fb0000ea0 0x0000003fb005b708 0x0000003fb005b650: 0x0000000000000000 0x0000003fb0000ea0 0x0000003fb005b660: 0x010000000004cea4 0x0000000000000004 0x0000003fb005b670: 0x000000000000001f 0x0000000000000002 0x0000003fb005b680: 0x0000000000000002 0x0000000000000001 0x0000003fb005b690: 0x00000000001ae451 0x000000000004cafc 0x0000003fb005b6a0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6b0: 0x0000003fb005b708 0x000000000004e4d8 0x0000003fb005b6c0: 0x0000003fb005b708 0x000000000004c8c0 0x0000003fb005b6d0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b6e0: 0x0000003fb005b6e8 0x000000000004cb10 0x0000003fb005b6f0: 0x0000003fb0000ea0 0x000000000004c8c0 0x0000003fb005b700: 0x0000003fb005b708 0x0000000000066a3c 0x0000003fb005b710: 0x0000003fb005b718 0x000000000004c8d8 0x0000003fb005b720: 0x00000000001b170e 0x000000000000002a 0x0000003fb005b730: <0x0000000000000002 0x00000000001b170e 0x0000003fb005b740: 0x000000000000002a 0x0000000000000080 0x0000003fb005b750: 0x0000000000000003 0x0000000000000000 0x0000003fb005b760: >0x0000000000000003 0x00000000000658a4 0x0000003fb005b770: 0x0000000000158554 0x00000000002a7328 0x0000003fb005b780: 0x00000000002b6f80 0x0000000000000000 0x0000003fb005b790: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7a0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7b0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7c0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7d0: 0x0000000000000000 0x0000000000000000 0x0000003fb005b7e0: 0x0000000000065ef8 0x0000003fb00ba820 0x0000003fb005b7f0: 0x0000000000158554 0x0000003fb002ef58 0x0000003fb005b800: 0x00000000001584d4 0x0000000000065f98 0x0000003fb005b810: 0x000000000006605c 0x0000000000280638 0x0000003fb005b820: 0x00000000002b6f80 0x0000000000065ad4 0x0000003fb005b830: 0x0000003fb00ba820 0x0000003fb005b8c8 0x0000003fb005b840: 0x00000000001584d4 0x0000003fb00ba820 0x0000003fb005b850: 0x0000003fb0030800 0x0000000000000000 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:719 +0x29c ERROR: exit status 2 ```
"runtime: unexpected return pc for runtime.sigpanic called from 0x0" / "fatal error: unknown caller pc" ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211121T161735-4026159646 runtime: unexpected return pc for runtime.sigpanic called from 0x0 stack: frame={sp:0x3fb405b910, fp:0x3fb405b940} stack=[0x3fb402e800,0x3fb402f000) fatal error: unknown caller pc runtime stack: runtime.throw({0x1aa247, 0x11}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime.gentraceback(0x49820, 0x3fb405b8f8, 0x0, 0x3fb40ba820, 0x0, 0x0, 0x7fffffff, 0x3fb403bfc0, 0x0, 0x0) /usr/local/go/src/runtime/traceback.go:274 +0x17c0 runtime.addOneOpenDeferFrame.func1() /usr/local/go/src/runtime/panic.go:751 +0x88 runtime.systemstack() /usr/local/go/src/runtime/asm_riscv64.s:131 +0x58 goroutine 8 [running]: runtime.systemstack_switch() /usr/local/go/src/runtime/asm_riscv64.s:94 +0x8 fp=0x3fb405b7f0 sp=0x3fb405b7e8 pc=0x80768 runtime.addOneOpenDeferFrame(0x3fb40ba820, 0x49820, 0x3fb405b8f8) /usr/local/go/src/runtime/panic.go:750 +0x80 fp=0x3fb405b828 sp=0x3fb405b7f0 pc=0x4b160 panic({0x1871a0, 0x2c5ab0}) /usr/local/go/src/runtime/panic.go:998 +0xcc fp=0x3fb405b8f8 sp=0x3fb405b828 pc=0x4c09c runtime.panicmem() /usr/local/go/src/runtime/panic.go:221 +0x60 fp=0x3fb405b910 sp=0x3fb405b8f8 pc=0x49820 runtime: unexpected return pc for runtime.sigpanic called from 0x0 stack: frame={sp:0x3fb405b910, fp:0x3fb405b940} stack=[0x3fb402e800,0x3fb402f000) runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:735 +0x264 fp=0x3fb405b940 sp=0x3fb405b910 pc=0x66a04 created by runtime/pprof.TestCPUProfileMultithreaded.func1 /usr/local/go/src/runtime/pprof/pprof_test.go:100 +0x64 goroutine 1 [chan receive]: testing.(*T).Run(0x3fb40ba1a0, {0x1acff7, 0x1b}, 0x1b7750) /usr/local/go/src/testing/testing.go:1307 +0x3a8 testing.runTests.func1(0x3fb40ba1a0) /usr/local/go/src/testing/testing.go:1598 +0x84 testing.tRunner(0x3fb40ba1a0, 0x3fb409dd50) /usr/local/go/src/testing/testing.go:1259 +0xf8 testing.runTests(0x3fb400a030, {0x2cb500, 0x1e, 0x1e}, {0x0, 0x0, 0x0}) /usr/local/go/src/testing/testing.go:1596 +0x434 testing.(*M).Run(0x3fb40a4680) /usr/local/go/src/testing/testing.go:1504 +0x574 main.main() _testmain.go:101 +0x174 goroutine 6 [runnable]: runtime/pprof.cpuHog2(0x0) /usr/local/go/src/runtime/pprof/pprof_test.go:71 +0x1c runtime/pprof.cpuHogger(0x1b7940, 0x2fc268, 0x5f5e100) /usr/local/go/src/runtime/pprof/pprof_test.go:40 +0x5c runtime/pprof.TestCPUProfileMultithreaded.func1(0x5f5e100) /usr/local/go/src/runtime/pprof/pprof_test.go:104 +0x8c runtime/pprof.testCPUProfile(0x3fb40ba340, 0x1b79b8, {0x3fb4062f58, 0x2, 0x2}, {0x0, 0x0, 0x0}, 0x1b7748) /usr/local/go/src/runtime/pprof/pprof_test.go:324 +0x108 runtime/pprof.TestCPUProfileMultithreaded(0x3fb40ba340) /usr/local/go/src/runtime/pprof/pprof_test.go:98 +0xd0 testing.tRunner(0x3fb40ba340, 0x1b7750) /usr/local/go/src/testing/testing.go:1259 +0xf8 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 7 [sleep]: time.Sleep(0x5f5e100) /usr/local/go/src/runtime/time.go:193 +0x16c runtime/pprof.profileWriter({0x1e0400, 0x3fb400e8a0}) /usr/local/go/src/runtime/pprof/pprof.go:799 +0x68 created by runtime/pprof.StartCPUProfile /usr/local/go/src/runtime/pprof/pprof.go:784 +0x190 ERROR: exit status 2 ```
"fatal error: runtime: split stack overflow" ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211121T161735-2006060682 runtime: newstack sp=0x3fa80098d0 stack=[0x3fa8062000, 0x3fa8063000] morebuf={pc:0x0 sp:0x3fa80098d0 lr:0x0} sched={pc:0x667b0 sp:0x3fa80098d0 lr:0x0 ctxt:0x0} runtime: gp=0x3fa80bc340, goid=6, gp->status=0x2 runtime: split stack overflow: 0x3fa80098d0 < 0x3fa8062000 fatal error: runtime: split stack overflow runtime stack: runtime.throw({0x1add89, 0x1d}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime.newstack() /usr/local/go/src/runtime/stack.go:1032 +0x850 runtime.morestack() /usr/local/go/src/runtime/asm_riscv64.s:202 +0x70 goroutine 6 [running]: runtime: unexpected return pc for runtime.sigpanic called from 0x0 stack: frame={sp:0x3fa80098d0, fp:0x3fa80098d0} stack=[0x3fa8062000,0x3fa8063000) runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:716 +0x10 fp=0x3fa80098d0 sp=0x3fa80098d0 pc=0x667b0 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 1 [chan receive]: testing.(*T).Run(0x3fa80bc000, {0x1acff7, 0x1b}, 0x1b7750) /usr/local/go/src/testing/testing.go:1307 +0x3a8 testing.runTests.func1(0x3fa80bc000) /usr/local/go/src/testing/testing.go:1598 +0x84 testing.tRunner(0x3fa80bc000, 0x3fa809dd50) /usr/local/go/src/testing/testing.go:1259 +0xf8 testing.runTests(0x3fa800a030, {0x2cb500, 0x1e, 0x1e}, {0x0, 0x0, 0x0}) /usr/local/go/src/testing/testing.go:1596 +0x434 testing.(*M).Run(0x3fa80a4680) /usr/local/go/src/testing/testing.go:1504 +0x574 main.main() _testmain.go:101 +0x174 goroutine 7 [sleep]: time.Sleep(0x5f5e100) /usr/local/go/src/runtime/time.go:193 +0x16c runtime/pprof.profileWriter({0x1e0400, 0x3fa800e8a0}) /usr/local/go/src/runtime/pprof/pprof.go:799 +0x68 created by runtime/pprof.StartCPUProfile /usr/local/go/src/runtime/pprof/pprof.go:784 +0x190 goroutine 8 [runnable]: runtime/pprof.cpuHog0(...) /usr/local/go/src/runtime/pprof/pprof_test.go:59 runtime/pprof.cpuHog1(0x0) /usr/local/go/src/runtime/pprof/pprof_test.go:54 +0xc runtime/pprof.cpuHogger(0x1b7938, 0x2fc260, 0x5f5e100) /usr/local/go/src/runtime/pprof/pprof_test.go:40 +0x5c runtime/pprof.TestCPUProfileMultithreaded.func1.1(0x5f5e100, 0x3fa805c3c0) /usr/local/go/src/runtime/pprof/pprof_test.go:101 +0x44 created by runtime/pprof.TestCPUProfileMultithreaded.func1 /usr/local/go/src/runtime/pprof/pprof_test.go:100 +0x64 ERROR: exit status 2 ```
rhysh commented 2 years ago

This looks like a problem with linux/riscv64 signal handling, not specific to the cpu profiler: the tests for the os/signal package see similar crashes (though because of the test structure, it's in a subprocess rather than the test runner itself).

$ GOOS=linux GOARCH=riscv64 go test -c -o /tmp/signal.go1.17.3.test os/signal
$ go version /tmp/signal.go1.17.3.test 
/tmp/signal.go1.17.3.test: go1.17.3
$ # scp and ssh to the linux/riscv64 host

sipeed@sipeed:~$ nohup ./stress ./signal.go1.17.3.test -test.short >> ./nohup.log &
sipeed@sipeed:~$ tail -n 1 ./nohup.log 
3h18m50s: 2809 runs so far, 3 failures (0.11%)

One of the failures is fatal error: unexpected signal during runtime execution. Two of them are fatal error: runtime: split stack overflow.

"fatal error: unexpected signal during runtime execution" ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211121T203003-1328641936 --- FAIL: TestNohup (1.75s) --- FAIL: TestNohup/nohup-2 (0.06s) signal_test.go:504: ran test with -send_uncaught_sighup=2 under nohup and it failed: expected success. Error: exit status 2 Output: === RUN TestStop === RUN TestStop/window_changed === PAUSE TestStop/window_changed === RUN TestStop/hangup === PAUSE TestStop/hangup === RUN TestStop/user_defined_signal_1 === PAUSE TestStop/user_defined_signal_1 === CONT TestStop/window_changed fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x2 pc=0x2] goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fa80097d0, fp:0x3fa8009800} stack=[0x3fa8002000,0x3fa800a000) 0x0000003fa80096d0: 0x0000000000000000 0x000000000004cb64 0x0000003fa80096e0: 0x000000000004c8b8 0x0000003fa80097a8 0x0000003fa80096f0: 0x0000000000000000 0x0000003fa8000000 0x0000003fa8009700: 0x010000000004ce9c 0x0000000000000004 0x0000003fa8009710: 0x000000000000001f 0x0000000000000002 0x0000003fa8009720: 0x0000000000000002 0x0000000000000001 0x0000003fa8009730: 0x0000000000163171 0x000000000004caf4 0x0000003fa8009740: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa8009750: 0x0000003fa80097a8 0x000000000004e4d0 0x0000003fa8009760: 0x0000003fa80097a8 0x000000000004c8b8 0x0000003fa8009770: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa8009780: 0x0000003fa8009788 0x000000000004cb08 0x0000003fa8009790: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa80097a0: 0x0000003fa80097a8 0x0000000000066f6c 0x0000003fa80097b0: 0x0000003fa80097b8 0x000000000004c8d0 0x0000003fa80097c0: 0x00000000001659bf 0x000000000000002a 0x0000003fa80097d0: <0x0000000000000002 0x00000000001659bf 0x0000003fa80097e0: 0x000000000000002a 0x0000000000000080 0x0000003fa80097f0: 0x0000000000000003 0x0000000000000000 0x0000003fa8009800: >0x0000000000000003 0x00000000000664c8 0x0000003fa8009810: 0x000000000008f2d4 0x0000000000214750 0x0000003fa8009820: 0x0000000000236c80 0x0000000000066004 0x0000003fa8009830: 0x000000000008f2d4 0x0000000000000000 0x0000003fa8009840: 0x0000000000000000 0x0000000000000000 0x0000003fa8009850: 0x0000000000000000 0x0000000000000000 0x0000003fa8009860: 0x0000010100000000 0x0000000000000000 0x0000003fa8009870: 0x0000000000000000 0x0000000000065f98 0x0000003fa8009880: 0x0000000000000000 0x0000000000000000 0x0000003fa8009890: 0x0000003fa80098e0 0x0000000000065e90 0x0000003fa80098a0: 0x000000000000001c 0x0000003fa8000000 0x0000003fa80098b0: 0x0000000000000000 0x0000000000000000 0x0000003fa80098c0: 0x0000000000000000 0x0000003fa8009990 0x0000003fa80098d0: 0x0000003fa8009a10 0x0000000000084bcc 0x0000003fa80098e0: 0x0000003f0000001c 0x0000003fa8009990 0x0000003fa80098f0: 0x0000003fa8009a10 0x0000003fa808a9c0 runtime.throw({0x1659bf, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime: unexpected return pc for runtime.sigpanic called from 0x2 stack: frame={sp:0x3fa80097d0, fp:0x3fa8009800} stack=[0x3fa8002000,0x3fa800a000) 0x0000003fa80096d0: 0x0000000000000000 0x000000000004cb64 0x0000003fa80096e0: 0x000000000004c8b8 0x0000003fa80097a8 0x0000003fa80096f0: 0x0000000000000000 0x0000003fa8000000 0x0000003fa8009700: 0x010000000004ce9c 0x0000000000000004 0x0000003fa8009710: 0x000000000000001f 0x0000000000000002 0x0000003fa8009720: 0x0000000000000002 0x0000000000000001 0x0000003fa8009730: 0x0000000000163171 0x000000000004caf4 0x0000003fa8009740: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa8009750: 0x0000003fa80097a8 0x000000000004e4d0 0x0000003fa8009760: 0x0000003fa80097a8 0x000000000004c8b8 0x0000003fa8009770: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa8009780: 0x0000003fa8009788 0x000000000004cb08 0x0000003fa8009790: 0x0000003fa8000000 0x000000000004c8b8 0x0000003fa80097a0: 0x0000003fa80097a8 0x0000000000066f6c 0x0000003fa80097b0: 0x0000003fa80097b8 0x000000000004c8d0 0x0000003fa80097c0: 0x00000000001659bf 0x000000000000002a 0x0000003fa80097d0: <0x0000000000000002 0x00000000001659bf 0x0000003fa80097e0: 0x000000000000002a 0x0000000000000080 0x0000003fa80097f0: 0x0000000000000003 0x0000000000000000 0x0000003fa8009800: >0x0000000000000003 0x00000000000664c8 0x0000003fa8009810: 0x000000000008f2d4 0x0000000000214750 0x0000003fa8009820: 0x0000000000236c80 0x0000000000066004 0x0000003fa8009830: 0x000000000008f2d4 0x0000000000000000 0x0000003fa8009840: 0x0000000000000000 0x0000000000000000 0x0000003fa8009850: 0x0000000000000000 0x0000000000000000 0x0000003fa8009860: 0x0000010100000000 0x0000000000000000 0x0000003fa8009870: 0x0000000000000000 0x0000000000065f98 0x0000003fa8009880: 0x0000000000000000 0x0000000000000000 0x0000003fa8009890: 0x0000003fa80098e0 0x0000000000065e90 0x0000003fa80098a0: 0x000000000000001c 0x0000003fa8000000 0x0000003fa80098b0: 0x0000000000000000 0x0000000000000000 0x0000003fa80098c0: 0x0000000000000000 0x0000003fa8009990 0x0000003fa80098d0: 0x0000003fa8009a10 0x0000000000084bcc 0x0000003fa80098e0: 0x0000003f0000001c 0x0000003fa8009990 0x0000003fa80098f0: 0x0000003fa8009a10 0x0000003fa808a9c0 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:719 +0x29c goroutine 7 [running]: syscall.Syscall(0x40, 0xa205c, 0x3fa8018080, 0x22) /usr/local/go/src/syscall/asm_linux_riscv64.s:13 +0x10 fp=0x3fa802e438 sp=0x3fa802e430 pc=0x8f1d0 syscall.write(0x3fa80480c0, {0x3fa8018080, 0x22, 0x40}) /usr/local/go/src/syscall/zsyscall_linux_riscv64.go:915 +0x5c fp=0x3fa802e488 sp=0x3fa802e438 pc=0x8e80c syscall.Write(...) /usr/local/go/src/syscall/syscall_unix.go:214 internal/poll.ignoringEINTRIO(...) /usr/local/go/src/internal/poll/fd_unix.go:582 internal/poll.(*FD).Write(0x3fa80480c0, {0x3fa8018080, 0x22, 0x40}) /usr/local/go/src/internal/poll/fd_unix.go:275 +0x310 fp=0x3fa802e520 sp=0x3fa802e488 pc=0xa3078 os.(*File).write(...) /usr/local/go/src/os/file_posix.go:49 os.(*File).Write(0x3fa8068f70, {0x3fa8018080, 0x22, 0x40}) /usr/local/go/src/os/file.go:176 +0x7c fp=0x3fa802e5a8 sp=0x3fa802e520 pc=0xa4f54 fmt.Fprintf({0x3fa8066340, 0x8157c}, {0x2b4f5, 0x18fd8b9d}, {0x3fa802e6d8, 0x9779c, 0xa1578e28dd9d}) /usr/local/go/src/fmt/print.go:205 +0x8c fp=0x3fa802e608 sp=0x3fa802e5a8 pc=0xc132c testing.(*chattyPrinter).Updatef(0xc05ed10f12e576fa, {0xdac1ad, 0x24e1c0}, {0x15e8e7, 0x11bf20}, {0x81, 0x803ab, 0x1c}) /usr/local/go/src/testing/testing.go:367 +0xfc fp=0x3fa802e680 sp=0x3fa802e608 pc=0xd884c testing.(*T).Parallel(0x803ab) /usr/local/go/src/testing/testing.go:1108 +0x290 fp=0x3fa802e6e8 sp=0x3fa802e680 pc=0xdb3e0 os/signal.TestStop.func1(0x3fa808a820) /usr/local/go/src/os/signal/signal_test.go:349 +0x38 fp=0x3fa802e780 sp=0x3fa802e6e8 pc=0x11be90 testing.tRunner(0x3fa808a820, 0x3fa8012170) /usr/local/go/src/testing/testing.go:1259 +0xf8 fp=0x3fa802e7c8 sp=0x3fa802e780 pc=0xdb568 runtime.goexit() /usr/local/go/src/runtime/asm_riscv64.s:507 +0x4 fp=0x3fa802e7c8 sp=0x3fa802e7c8 pc=0x83714 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 1 [chan receive]: testing.(*T).Run(0x3fa808a340, {0x15d665, 0x8}, 0x168300) /usr/local/go/src/testing/testing.go:1307 +0x3a8 testing.runTests.func1(0x3fa808a340) /usr/local/go/src/testing/testing.go:1598 +0x84 testing.tRunner(0x3fa808a340, 0x3fa803dd50) /usr/local/go/src/testing/testing.go:1259 +0xf8 testing.runTests(0x3fa800a030, {0x24a980, 0x13, 0x13}, {0x0, 0x0, 0x0}) /usr/local/go/src/testing/testing.go:1596 +0x434 testing.(*M).Run(0x3fa808c000) /usr/local/go/src/testing/testing.go:1504 +0x574 main.main() _testmain.go:83 +0x174 goroutine 6 [chan receive]: testing.tRunner.func1(0x3fa808a4e0) /usr/local/go/src/testing/testing.go:1225 +0x420 testing.tRunner(0x3fa808a4e0, 0x168300) /usr/local/go/src/testing/testing.go:1265 +0x14c created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 8 [runnable]: testing.(*T).Parallel(0x3fa808ab60) /usr/local/go/src/testing/testing.go:1104 +0x1f0 os/signal.TestStop.func1(0x3fa808ab60) /usr/local/go/src/os/signal/signal_test.go:349 +0x38 testing.tRunner(0x3fa808ab60, 0x3fa80121b0) /usr/local/go/src/testing/testing.go:1259 +0xf8 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 9 [runnable]: testing.(*T).Parallel(0x3fa808aea0) /usr/local/go/src/testing/testing.go:1104 +0x1f0 os/signal.TestStop.func1(0x3fa808aea0) /usr/local/go/src/os/signal/signal_test.go:349 +0x38 testing.tRunner(0x3fa808aea0, 0x3fa80121f0) /usr/local/go/src/testing/testing.go:1259 +0xf8 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 FAIL ERROR: exit status 1 ```
"fatal error: runtime: split stack overflow" ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211121T203003-2319835814 --- FAIL: TestNohup (1.95s) --- FAIL: TestNohup/nohup-1 (0.06s) signal_test.go:504: ran test with -send_uncaught_sighup=1 under nohup and it failed: expected success. Error: exit status 2 Output: === RUN TestStop === RUN TestStop/window_changed === PAUSE TestStop/window_changed === RUN TestStop/hangup === PAUSE TestStop/hangup === RUN TestStop/user_defined_signal_1 === PAUSE TestStop/user_defined_signal_1 === CONT TestStop/window_changed runtime: newstack sp=0x3fb0009940 stack=[0x3fb002e000, 0x3fb002e800] morebuf={pc:0x0 sp:0x3fb0009940 lr:0x0} sched={pc:0x66ce0 sp:0x3fb0009940 lr:0x0 ctxt:0x0} runtime: gp=0x3fb0098340, goid=7, gp->status=0x2 runtime: split stack overflow: 0x3fb0009940 < 0x3fb002e000 fatal error: runtime: split stack overflow runtime stack: runtime.throw({0x162bf4, 0x1d}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime.newstack() /usr/local/go/src/runtime/stack.go:1032 +0x850 runtime.morestack() /usr/local/go/src/runtime/asm_riscv64.s:202 +0x70 goroutine 7 [running]: runtime: unexpected return pc for runtime.sigpanic called from 0x0 stack: frame={sp:0x3fb0009940, fp:0x3fb0009940} stack=[0x3fb002e000,0x3fb002e800) runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:716 +0x10 fp=0x3fb0009940 sp=0x3fb0009940 pc=0x66ce0 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 1 [chan receive]: testing.(*T).Run(0x3fb0001ba0, {0x15d665, 0x8}, 0x168300) /usr/local/go/src/testing/testing.go:1307 +0x3a8 testing.runTests.func1(0x3fb0001ba0) /usr/local/go/src/testing/testing.go:1598 +0x84 testing.tRunner(0x3fb0001ba0, 0x3fb003bd50) /usr/local/go/src/testing/testing.go:1259 +0xf8 testing.runTests(0x3fb000a030, {0x24a980, 0x13, 0x13}, {0x0, 0x0, 0x0}) /usr/local/go/src/testing/testing.go:1596 +0x434 testing.(*M).Run(0x3fb0082000) /usr/local/go/src/testing/testing.go:1504 +0x574 main.main() _testmain.go:83 +0x174 goroutine 6 [chan receive]: testing.tRunner.func1(0x3fb0001d40) /usr/local/go/src/testing/testing.go:1225 +0x420 testing.tRunner(0x3fb0001d40, 0x168300) /usr/local/go/src/testing/testing.go:1265 +0x14c created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 8 [runnable]: testing.(*T).Parallel(0x3fb00984e0) /usr/local/go/src/testing/testing.go:1104 +0x1f0 os/signal.TestStop.func1(0x3fb00984e0) /usr/local/go/src/os/signal/signal_test.go:349 +0x38 testing.tRunner(0x3fb00984e0, 0x3fb00121b0) /usr/local/go/src/testing/testing.go:1259 +0xf8 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 goroutine 9 [runnable]: testing.(*T).Parallel(0x3fb0098820) /usr/local/go/src/testing/testing.go:1104 +0x1f0 os/signal.TestStop.func1(0x3fb0098820) /usr/local/go/src/os/signal/signal_test.go:349 +0x38 testing.tRunner(0x3fb0098820, 0x3fb00121f0) /usr/local/go/src/testing/testing.go:1259 +0xf8 created by testing.(*T).Run /usr/local/go/src/testing/testing.go:1306 +0x388 FAIL ERROR: exit status 1 ```
mengzhuo commented 2 years ago

kindly cc @cherrymui @4a6f656c @prattmic

cherrymui commented 2 years ago

Yeah, it does look like signal related. Is it reproducible at tip? Thanks.

rhysh commented 2 years ago

Yes, the runtime/pprof crashes reproduce at tip. It's very fast, I got 10 failures in under 4 minutes. The crashes in os/signal are slower, less than one per hour on go1.17.3. I'll leave it running.

$ ~/gotip/bin/go version
go version devel go1.18-cd0bf38966 Mon Nov 22 16:53:57 2021 +0000 darwin/amd64
$ GOOS=linux GOARCH=riscv64 ~/gotip/bin/go test -c -o /tmp/signal.go1.18-devel-cd0bf38966.test os/signal
$ GOOS=linux GOARCH=riscv64 ~/gotip/bin/go test -c -o /tmp/pprof.go1.18-devel-cd0bf38966.test runtime/pprof
$ go version /tmp/*devel*.test
/tmp/pprof.go1.18-devel-cd0bf38966.test: devel go1.18-cd0bf38966 Mon Nov 22 16:53:57 2021 +0000
/tmp/signal.go1.18-devel-cd0bf38966.test: devel go1.18-cd0bf38966 Mon Nov 22 16:53:57 2021 +0000

# scp, ssh

sipeed@sipeed:~$ nohup ./stress ./pprof.go1.18-devel-cd0bf38966.test -test.run=TestCPUProfileMultithreaded -test.short > ./nohup.pprof.cd0bf38966.log &
sipeed@sipeed:~$ tac ./nohup.pprof.cd0bf38966.log | grep failures | head -n 1
3m15s: 192 runs so far, 10 failures (5.21%)
sipeed@sipeed:~$ (for f in /tmp/go-stress-20211122T181250-* ; do grep fatal $f | head -n 1 ; done) | sort | uniq -c | head
      8 fatal error: unexpected signal during runtime execution
      2 fatal error: unknown caller pc

sipeed@sipeed:~$ nohup ./stress ./signal.go1.18-devel-cd0bf38966.test -test.short > ./nohup.signal.cd0bf38966.log &
sipeed@sipeed:~$ tac ./nohup.signal.cd0bf38966.log | grep failures | head -n 1
9m35s: 136 runs so far, 0 failures
rhysh commented 2 years ago

I haven't seen any os/signal failures yet at tip, and with the go1.17.3 failure rate I'd expect "about 6" by now:

sipeed@sipeed:~$ tac ./nohup.signal.cd0bf38966.log | grep failures | head -n 1
6h39m45s: 5692 runs so far, 0 failures
rhysh commented 2 years ago

The os/signal failures reproduce at tip, but very slowly (10 hours):

sipeed@sipeed:~$ tac ./nohup.signal.cd0bf38966.log | grep failures | head -n 1
21h21m25s: 18251 runs so far, 2 failures (0.01%)
sipeed@sipeed:~$ (for f in /tmp/go-stress-20211122T181657-* ; do grep fatal $f | head -n 1 ; done) | sort | uniq -c | head
      1             fatal error: runtime: split stack overflow
      1             fatal error: unexpected signal during runtime execution
4a6f656c commented 2 years ago

Yes, the runtime/pprof crashes reproduce at tip. It's very fast, I got 10 failures in under 4 minutes. The crashes in os/signal are slower, less than one per hour on go1.17.3. I'll leave it running.

I'm not able to reproduce this here on a SiFive HiFive Unleashed:

$ ../bin/go version
go version devel go1.18-00045b76e5 Tue Nov 23 17:30:32 2021 +0000 linux/riscv64
$ ../bin/go test -c -o /tmp/pprof.test runtime/pprof
$ while /tmp/pprof.test -test.run=TestCPUProfileMultithreaded -test.short ; do : ; done

That ran for several hours without failing - @mengzhuo can you test on your SiFive HiFive Unmatched?

mengzhuo commented 2 years ago

@4a6f656c Joel, I ran this test for about 1 hour and no failure occurred. I notice that the CPU of Nezha is only one core. Should we disable the multi-thread tests for single core CPU?


some updates, It runs 18 hours and no failures.

Yes, the runtime/pprof crashes reproduce at tip. It's very fast, I got 10 failures in under 4 minutes. The crashes in os/signal are slower, less than one per hour on go1.17.3. I'll leave it running.

I'm not able to reproduce this here on a SiFive HiFive Unleashed:

$ ../bin/go version
go version devel go1.18-00045b76e5 Tue Nov 23 17:30:32 2021 +0000 linux/riscv64
$ ../bin/go test -c -o /tmp/pprof.test runtime/pprof
$ while /tmp/pprof.test -test.run=TestCPUProfileMultithreaded -test.short ; do : ; done

That ran for several hours without failing - @mengzhuo can you test on your SiFive HiFive Unmatched?

rhysh commented 2 years ago

Should we disable the multi-thread tests for single core CPU?

It's a crash, not a test failure, so I don't think that disabling the tests is the right thing to do.

The TestCPUProfileMultithreaded case sets GOMAXPROCS to 2. On multi-core machines, does the crash reproduce when taskset or similar restricts all threads to run on a single core?


I ran the stress test for several hours with async preemption disabled and saw zero failures:

sipeed@sipeed:~$ nohup env GODEBUG=asyncpreemptoff=1 ./stress ./pprof.go1.18-devel-cd0bf38966.test -test.run='^TestCPUProfileMultithreaded$' -test.short > ./nohup.pprof.cd0bf38966.asyncpreemptoff.log &

# wait a while

sipeed@sipeed:~$ tac ./nohup.pprof.cd0bf38966.asyncpreemptoff.log | grep failures | head -n 1
5h23m0s: 15074 runs so far, 0 failures

Some of the failures I see, which complain of fatal error: unexpected signal during runtime execution, happen soon after a SIGPROF signal arrives on a thread that is already running on the gsignal stack to handle SIGURG, meaning that inside the sighandler function, gp == _g_.m.gsignal.

The other two failure modes, fatal error: runtime: split stack overflow and fatal error: unknown caller pc, seem like they could also be caused by stack-sharing: one execution using the stack from where the other left off leading to larger-than-expected stack use, or one clobbering the return values that the other had saved.

What, if anything, would make it safe for the signal handlers to be re-entrant, and/or to share the M's gsignal stack? I haven't seen these failures on linux/amd64 (though I don't typically use single-core machines).


I added some debuglog to the signal handler; this is what I see. Atop go1.17.3: ``` diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go index 6096760b50..63f7b1f8ba 100644 --- a/src/runtime/signal_unix.go +++ b/src/runtime/signal_unix.go @@ -538,6 +538,17 @@ func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) { _g_ := getg() c := &sigctxt{info, ctxt} + if sig == 11 || gp == _g_.m.gsignal { + log := dlog().s("sighandler").s("m").i64(_g_.m.id).s("sig").u32(sig).s("pc").hex(uint64(c.sigpc())).s("g").i64(_g_.goid) + if _g_.m.curg != nil { + log = log.s("curg").i64(_g_.m.curg.goid) + } + if gp != nil { + log = log.s("gp").i64(gp.goid) + } + log.end() + } + if sig == _SIGPROF { sigprof(c.sigpc(), c.sigsp(), c.siglr(), gp, _g_.m) return ``` ``` $ GOOS=linux GOARCH=riscv64 ./bin/go test -c -o /tmp/pprof.go1.17-edit.test -tags 'debuglog' runtime/pprof # scp, ssh sipeed@sipeed:~$ ./stress ./pprof.go1.17-edit.test -test.run='^TestCPUProfileMultithreaded$' -test.short sipeed@sipeed:~$ for f in /tmp/go-stress-20211126T234753-* ; do echo $f ; cat $f | egrep "^(fatal error|\[)" ; echo ; done /tmp/go-stress-20211126T234753-1671273976 fatal error: unknown caller pc [0.137608292 P 1] sighandler m 3 sig 11 pc 0x0 g 0 curg 8 gp 8 /tmp/go-stress-20211126T234753-2730224263 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x2 pc=0x2] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.729127202 P 0] sighandler m 0 sig 27 pc 0x70bcc g 0 curg 6 gp 0 [0.729284995 P 0] sighandler m 0 sig 11 pc 0x2 g 0 curg 6 gp 0 [0.736798263 P 0] sighandler m 0 sig 11 pc 0x78ef8 g 0 curg 6 gp 0 [0.739702328 P 0] sighandler m 0 sig 27 pc 0x8608c g 0 curg 6 gp 0 /tmp/go-stress-20211126T234753-32012639 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.118312684 P 1] sighandler m 3 sig 27 pc 0x71cc8 g 0 curg 8 gp 0 [0.118463435 P 1] sighandler m 3 sig 11 pc 0x0 g 0 curg 8 gp 0 [0.126182496 P 1] sighandler m 3 sig 11 pc 0x78ef8 g 0 curg 8 gp 0 [0.128827642 P 1] sighandler m 3 sig 27 pc 0x8608c g 0 curg 8 gp 0 /tmp/go-stress-20211126T234753-3518561130 fatal error: runtime: split stack overflow [0.562301991 P 0] sighandler m 0 sig 11 pc 0x0 g 0 curg 6 gp 6 /tmp/go-stress-20211126T234753-3944482246 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0x80] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.540990792 P 1] sighandler m 2 sig 27 pc 0x4f86c g 0 curg 8 gp 0 [0.541141210 P 1] sighandler m 2 sig 11 pc 0x80 g 0 curg 8 gp 0 [0.548668404 P 1] sighandler m 2 sig 11 pc 0x78ef8 g 0 curg 8 gp 0 [0.551650807 P 1] sighandler m 2 sig 27 pc 0x8608c g 0 curg 8 gp 0 /tmp/go-stress-20211126T234753-3953261365 fatal error: runtime: split stack overflow [0.207116710 P 0] sighandler m 0 sig 11 pc 0x0 g 0 curg 6 gp 6 /tmp/go-stress-20211126T234753-873267354 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.639896662 P 0] sighandler m 0 sig 27 pc 0x71cc8 g 0 curg 6 gp 0 [0.640053955 P 0] sighandler m 0 sig 11 pc 0x0 g 0 curg 6 gp 0 [0.647703639 P 0] sighandler m 0 sig 11 pc 0x78ef8 g 0 curg 6 gp 0 [0.650338701 P 0] sighandler m 0 sig 27 pc 0x71ed8 g 0 curg 6 gp 0 /tmp/go-stress-20211126T234753-946845684 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x0] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.023616104 P 0] sighandler m 0 sig 23 pc 0x78cb8 g 0 curg 6 gp 0 [0.023966732 P 0] sighandler m 0 sig 23 pc 0x61d3c g 0 curg 6 gp 0 [0.024075024 P 0] sighandler m 0 sig 23 pc 0x61d3c g 0 curg 6 gp 0 [0.583399559 P 0] sighandler m 0 sig 27 pc 0x71cc8 g 0 curg 6 gp 0 [0.583484268 P 0] sighandler m 0 sig 11 pc 0x0 g 0 curg 6 gp 0 [0.591338775 P 0] sighandler m 0 sig 11 pc 0x78ef8 g 0 curg 6 gp 0 [0.594175667 P 0] sighandler m 0 sig 27 pc 0x8608c g 0 curg 6 gp 0 /tmp/go-stress-20211126T234753-948494023 fatal error: runtime: split stack overflow [0.128850956 P 0] sighandler m 0 sig 11 pc 0x0 g 0 curg 6 gp 6 /tmp/go-stress-20211126T234753-976614241 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0x80] fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ef8] [0.184631256 P 0] sighandler m 0 sig 27 pc 0x4f86c g 0 curg 6 gp 0 [0.184781716 P 0] sighandler m 0 sig 11 pc 0x80 g 0 curg 6 gp 0 [0.192525895 P 0] sighandler m 0 sig 11 pc 0x78ef8 g 0 curg 6 gp 0 [0.195321040 P 0] sighandler m 0 sig 27 pc 0x8608c g 0 curg 6 gp 0 # and the addresses that gp was running when its M received signal 27 (SIGPROF) $ echo 0x4f86c | go tool addr2line /tmp/pprof.go1.17-edit.test runtime.isAbortPC /usr/local/go/src/runtime/panic.go:1492 $ echo 0x71cc8 | go tool addr2line /tmp/pprof.go1.17-edit.test runtime.pcdatavalue2 /usr/local/go/src/runtime/symtab.go:1005 ```
cherrymui commented 2 years ago

@rhysh thanks for the logs! If the signal handler is reentrant, or the signal stack is somehow shared, I think it is very wrong. Could you print the stack pointer at entry of signal handler to see if it is running on the same stack?

cherrymui commented 2 years ago

gp == _g_.m.gsignal

I think that condition should never be true. This could either due to that we're getting nested signals or the G is somehow wrong. As we got signal PC in runtime.pcdatavalue2, which is only called from the signal handler, it indicates that we're actually getting nested signals, which should not happen as we block signals when running the signal handler.

What kernel version are you running on? @4a6f656c and @mengzhuo , what kernel version does the builder running?

rhysh commented 2 years ago

Yes, the variation in the result of getcallersp makes it looks like nested signals. I'm running the OS installation that came with the machine, kernel 5.4.61.

sipeed@sipeed:~$ uname -a
Linux sipeed 5.4.61 #15 PREEMPT Thu Jun 10 06:50:03 UTC 2021 riscv64 GNU/Linux

In a crash from "fatal error: unexpected signal during runtime execution", it looks like M0 processes SIGURG and SIGPROF always with SP 0x3fa80098d8. On M3, it usually processes those with SP 0x3fa805b8d8. But then it gets a SIGPROF/27 on PC 0x16c34 (runtime.(*debugLogWriter).varint, hmm) and the SP for that call to runtime.sighandler is 0x3fa805b0c8, a 0x810 or 2064-byte delta from the usual place.

We don't see a log line from a competing SIGURG/23 delivery, but we do see that the SIGPROF/27 signal came when the PC was inside a function that only the signal handler would call. And we see a split in the debuglog output (">> begin log 0 <<"), which seems to mean that something else (the signal handler) was using the debuglog that's usually in the global pool (between calls to dlog and end) which meant the SIGPROF/27 handler needed to allocate another buffer.

sipeed@sipeed:~$ cat /tmp/go-stress-20211129T195813-1450030122 | grep 'sp 0x' | cut -d' ' -f 5-10 | sort | uniq -c
      9 m 0 sp 0x3fa80098d8 sig 23
      7 m 0 sp 0x3fa80098d8 sig 27
      1 m 3 sp 0x3fa805a378 sig 27
      1 m 3 sp 0x3fa805aa48 sig 11
      1 m 3 sp 0x3fa805b0c8 sig 11
      1 m 3 sp 0x3fa805b0c8 sig 27
      7 m 3 sp 0x3fa805b8d8 sig 23
      7 m 3 sp 0x3fa805b8d8 sig 27
Here's the current diff from go1.17.3 ``` $ git diff diff --git a/src/runtime/signal_unix.go b/src/runtime/signal_unix.go index 6096760b50..b17a001ac6 100644 --- a/src/runtime/signal_unix.go +++ b/src/runtime/signal_unix.go @@ -538,6 +538,19 @@ func sighandler(sig uint32, info *siginfo, ctxt unsafe.Pointer, gp *g) { _g_ := getg() c := &sigctxt{info, ctxt} + if sig == 11 || gp == _g_.m.gsignal || true { + log := dlog().s("sighandler") + log = log.s("m").i64(_g_.m.id).s("sp").hex(uint64(getcallersp())) + log = log.s("sig").u32(sig).s("pc").hex(uint64(c.sigpc())).s("g").i64(_g_.goid) + if _g_.m.curg != nil { + log = log.s("curg").i64(_g_.m.curg.goid) + } + if gp != nil { + log = log.s("gp").i64(gp.goid) + } + log.end() + } + if sig == _SIGPROF { sigprof(c.sigpc(), c.sigsp(), c.siglr(), gp, _g_.m) return ```
And the crash log from stress (including debuglog output) ``` sipeed@sipeed:~$ cat /tmp/go-stress-20211129T195813-1450030122 fatal error: unexpected signal during runtime execution [signal SIGSEGV: segmentation violation code=0x1 addr=0x80 pc=0x80] goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigpanic called from 0x80 stack: frame={sp:0x3fa805b7b8, fp:0x3fa805b7e8} stack=[0x3fa8054000,0x3fa805c000) 0x0000003fa805b6b8: 0x0000000000000000 0x000000000004f024 0x0000003fa805b6c8: 0x000000000004ed78 0x0000003fa805b790 0x0000003fa805b6d8: 0x0000000000000000 0x0000003fa8000ea0 0x0000003fa805b6e8: 0x010000000004f35c 0x0000000000000004 0x0000003fa805b6f8: 0x000000000000001f 0x0000000000000080 0x0000003fa805b708: 0x0000000000000080 0x0000000000000001 0x0000003fa805b718: 0x00000000001aed2c 0x000000000004efb4 0x0000003fa805b728: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b738: 0x0000003fa805b790 0x0000000000050998 0x0000003fa805b748: 0x0000003fa805b790 0x000000000004ed78 0x0000003fa805b758: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b768: 0x0000003fa805b770 0x000000000004efc8 0x0000003fa805b778: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b788: 0x0000003fa805b790 0x000000000006913c 0x0000003fa805b798: 0x0000003fa805b7a0 0x000000000004ed90 0x0000003fa805b7a8: 0x00000000001b200b 0x000000000000002a 0x0000003fa805b7b8: <0x0000000000000080 0x00000000001b200b 0x0000003fa805b7c8: 0x000000000000002a 0x0000000000000000 0x0000003fa805b7d8: 0x0000000000000080 0x0000000000000003 0x0000003fa805b7e8: >0x0000000000000080 0x0000000000068304 0x0000003fa805b7f8: 0x0000003fac01b000 0x0000000000000001 0x0000003fa805b808: 0x00030cb3a06830c3 0x00030cb3a06830c3 0x0000003fa805b818: 0x0000003fac01b000 0x000012bc998f4621 0x0000003fa805b828: 0x0000000000067f94 0x000000000015ae20 0x0000003fa805b838: 0x0000003fa802ef58 0x000000000015ada4 0x0000003fa805b848: 0x0000003fa80c0680 0x0000003fa8030900 0x0000003fa805b858: 0x0000000000000000 0x0000010100000000 0x0000003fa805b868: 0x0000000000000000 0x0000000000000000 0x0000003fa805b878: 0x0000000000067f28 0x0000000000000000 0x0000003fa805b888: 0x0000000000000000 0x0000003fa805b8e0 0x0000003fa805b898: 0x0000000000067e20 0x0000000000000017 0x0000003fa805b8a8: 0x0000003fa8000ea0 0x0000000000000000 0x0000003fa805b8b8: 0x0000000000000000 0x0000000000000000 0x0000003fa805b8c8: 0x0000003fa805b990 0x0000003fa805ba10 0x0000003fa805b8d8: 0x00000000000862cc 0x0000003f00000017 runtime.throw({0x1b200b, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime: unexpected return pc for runtime.sigpanic called from 0x80 stack: frame={sp:0x3fa805b7b8, fp:0x3fa805b7e8} stack=[0x3fa8054000,0x3fa805c000) 0x0000003fa805b6b8: 0x0000000000000000 0x000000000004f024 0x0000003fa805b6c8: 0x000000000004ed78 0x0000003fa805b790 0x0000003fa805b6d8: 0x0000000000000000 0x0000003fa8000ea0 0x0000003fa805b6e8: 0x010000000004f35c 0x0000000000000004 0x0000003fa805b6f8: 0x000000000000001f 0x0000000000000080 0x0000003fa805b708: 0x0000000000000080 0x0000000000000001 0x0000003fa805b718: 0x00000000001aed2c 0x000000000004efb4 0x0000003fa805b728: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b738: 0x0000003fa805b790 0x0000000000050998 0x0000003fa805b748: 0x0000003fa805b790 0x000000000004ed78 0x0000003fa805b758: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b768: 0x0000003fa805b770 0x000000000004efc8 0x0000003fa805b778: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b788: 0x0000003fa805b790 0x000000000006913c 0x0000003fa805b798: 0x0000003fa805b7a0 0x000000000004ed90 0x0000003fa805b7a8: 0x00000000001b200b 0x000000000000002a 0x0000003fa805b7b8: <0x0000000000000080 0x00000000001b200b 0x0000003fa805b7c8: 0x000000000000002a 0x0000000000000000 0x0000003fa805b7d8: 0x0000000000000080 0x0000000000000003 0x0000003fa805b7e8: >0x0000000000000080 0x0000000000068304 0x0000003fa805b7f8: 0x0000003fac01b000 0x0000000000000001 0x0000003fa805b808: 0x00030cb3a06830c3 0x00030cb3a06830c3 0x0000003fa805b818: 0x0000003fac01b000 0x000012bc998f4621 0x0000003fa805b828: 0x0000000000067f94 0x000000000015ae20 0x0000003fa805b838: 0x0000003fa802ef58 0x000000000015ada4 0x0000003fa805b848: 0x0000003fa80c0680 0x0000003fa8030900 0x0000003fa805b858: 0x0000000000000000 0x0000010100000000 0x0000003fa805b868: 0x0000000000000000 0x0000000000000000 0x0000003fa805b878: 0x0000000000067f28 0x0000000000000000 0x0000003fa805b888: 0x0000000000000000 0x0000003fa805b8e0 0x0000003fa805b898: 0x0000000000067e20 0x0000000000000017 0x0000003fa805b8a8: 0x0000003fa8000ea0 0x0000000000000000 0x0000003fa805b8b8: 0x0000000000000000 0x0000000000000000 0x0000003fa805b8c8: 0x0000003fa805b990 0x0000003fa805ba10 0x0000003fa805b8d8: 0x00000000000862cc 0x0000003f00000017 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:732 +0x29c goroutine 8 [running]: fatal error: unexpected signal during runtime execution panic during panic [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78ed0] goroutine 0 [idle]: runtime: unexpected return pc for runtime.sigpanic called from 0x80 stack: frame={sp:0x3fa805b7b8, fp:0x3fa805b7e8} stack=[0x3fa8054000,0x3fa805c000) 0x0000003fa805b6b8: 0x0000000000000000 0x000000000004f024 0x0000003fa805b6c8: 0x0000003fa8000ea0 0x0000003fa805b790 0x0000003fa805b6d8: 0x0000000000000000 0x0000003fa8000ea0 0x0000003fa805b6e8: 0x010000000004f35c 0x0000000000000004 0x0000003fa805b6f8: 0x000000000000001f 0x0000000000000080 0x0000003fa805b708: 0x0000000000000080 0x0000000000000001 0x0000003fa805b718: 0x00000000001aed2c 0x000000000004efb4 0x0000003fa805b728: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b738: 0x0000003fa805b790 0x0000000000050998 0x0000003fa805b748: 0x0000003fa805b790 0x000000000004ed78 0x0000003fa805b758: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b768: 0x0000003fa805b770 0x000000000004efc8 0x0000003fa805b778: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b788: 0x0000003fa805b790 0x000000000006913c 0x0000003fa805b798: 0x0000003fa805b7a0 0x000000000004ed90 0x0000003fa805b7a8: 0x00000000001b200b 0x000000000000002a 0x0000003fa805b7b8: <0x0000000000000080 0x00000000001b200b 0x0000003fa805b7c8: 0x000000000000002a 0x0000000000000000 0x0000003fa805b7d8: 0x0000000000000080 0x0000000000000003 0x0000003fa805b7e8: >0x0000000000000080 0x0000000000068304 0x0000003fa805b7f8: 0x0000003fac01b000 0x0000000000000001 0x0000003fa805b808: 0x00030cb3a06830c3 0x00030cb3a06830c3 0x0000003fa805b818: 0x0000003fac01b000 0x000012bc998f4621 0x0000003fa805b828: 0x0000000000067f94 0x000000000015ae20 0x0000003fa805b838: 0x0000003fa802ef58 0x000000000015ada4 0x0000003fa805b848: 0x0000003fa80c0680 0x0000003fa8030900 0x0000003fa805b858: 0x0000000000000000 0x0000010100000000 0x0000003fa805b868: 0x0000000000000000 0x0000000000000000 0x0000003fa805b878: 0x0000000000067f28 0x0000000000000000 0x0000003fa805b888: 0x0000000000000000 0x0000003fa805b8e0 0x0000003fa805b898: 0x0000000000067e20 0x0000000000000017 0x0000003fa805b8a8: 0x0000003fa8000ea0 0x0000000000000000 0x0000003fa805b8b8: 0x0000000000000000 0x0000000000000000 0x0000003fa805b8c8: 0x0000003fa805b990 0x0000003fa805ba10 0x0000003fa805b8d8: 0x00000000000862cc 0x0000003f00000017 runtime.throw({0x1b200b, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:732 +0x29c runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x3fa80c0680, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0) /usr/local/go/src/runtime/traceback.go:247 +0x4d8 runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x3fa80c0680, 0x0) /usr/local/go/src/runtime/traceback.go:823 +0x14c runtime.traceback(...) /usr/local/go/src/runtime/traceback.go:777 runtime.tracebackothers(0x3fa8000ea0) /usr/local/go/src/runtime/traceback.go:1015 +0xcc runtime.dopanic_m(0x3fa8000ea0, 0x4ed78, 0x3fa805b790) /usr/local/go/src/runtime/panic.go:1398 +0x350 runtime.fatalthrow.func1() /usr/local/go/src/runtime/panic.go:1253 +0x5c runtime.fatalthrow() /usr/local/go/src/runtime/panic.go:1250 +0x4c runtime.throw({0x1b200b, 0x2a}) /usr/local/go/src/runtime/panic.go:1198 +0x60 runtime: unexpected return pc for runtime.sigpanic called from 0x80 stack: frame={sp:0x3fa805b7b8, fp:0x3fa805b7e8} stack=[0x3fa8054000,0x3fa805c000) 0x0000003fa805b6b8: 0x0000000000000000 0x000000000004f024 0x0000003fa805b6c8: 0x0000003fa8000ea0 0x0000003fa805b790 0x0000003fa805b6d8: 0x0000000000000000 0x0000003fa8000ea0 0x0000003fa805b6e8: 0x010000000004f35c 0x0000000000000004 0x0000003fa805b6f8: 0x000000000000001f 0x0000000000000080 0x0000003fa805b708: 0x0000000000000080 0x0000000000000001 0x0000003fa805b718: 0x00000000001aed2c 0x000000000004efb4 0x0000003fa805b728: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b738: 0x0000003fa805b790 0x0000000000050998 0x0000003fa805b748: 0x0000003fa805b790 0x000000000004ed78 0x0000003fa805b758: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b768: 0x0000003fa805b770 0x000000000004efc8 0x0000003fa805b778: 0x0000003fa8000ea0 0x000000000004ed78 0x0000003fa805b788: 0x0000003fa805b790 0x000000000006913c 0x0000003fa805b798: 0x0000003fa805b7a0 0x000000000004ed90 0x0000003fa805b7a8: 0x00000000001b200b 0x000000000000002a 0x0000003fa805b7b8: <0x0000000000000080 0x00000000001b200b 0x0000003fa805b7c8: 0x000000000000002a 0x0000000000000000 0x0000003fa805b7d8: 0x0000000000000080 0x0000000000000003 0x0000003fa805b7e8: >0x0000000000000080 0x0000000000068304 0x0000003fa805b7f8: 0x0000003fac01b000 0x0000000000000001 0x0000003fa805b808: 0x00030cb3a06830c3 0x00030cb3a06830c3 0x0000003fa805b818: 0x0000003fac01b000 0x000012bc998f4621 0x0000003fa805b828: 0x0000000000067f94 0x000000000015ae20 0x0000003fa805b838: 0x0000003fa802ef58 0x000000000015ada4 0x0000003fa805b848: 0x0000003fa80c0680 0x0000003fa8030900 0x0000003fa805b858: 0x0000000000000000 0x0000010100000000 0x0000003fa805b868: 0x0000000000000000 0x0000000000000000 0x0000003fa805b878: 0x0000000000067f28 0x0000000000000000 0x0000003fa805b888: 0x0000000000000000 0x0000003fa805b8e0 0x0000003fa805b898: 0x0000000000067e20 0x0000000000000017 0x0000003fa805b8a8: 0x0000003fa8000ea0 0x0000000000000000 0x0000003fa805b8b8: 0x0000000000000000 0x0000000000000000 0x0000003fa805b8c8: 0x0000003fa805b990 0x0000003fa805ba10 0x0000003fa805b8d8: 0x00000000000862cc 0x0000003f00000017 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:732 +0x29c >> begin log 1 << [0.026027891 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae54 g 0 curg 6 gp 6 [0.034801424 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae28 g 0 curg 8 gp 8 [0.036000768 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae44 g 0 curg 8 gp 8 [0.044301756 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae64 g 0 curg 6 gp 6 [0.046072521 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae84 g 0 curg 6 gp 6 [0.054425260 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae40 g 0 curg 8 gp 8 [0.056001690 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae3c g 0 curg 8 gp 8 [0.064551597 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae68 g 0 curg 6 gp 6 [0.065965193 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae6c g 0 curg 6 gp 6 [0.074677393 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae30 g 0 curg 8 gp 8 [0.075943070 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae44 g 0 curg 8 gp 8 [0.084789021 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae68 g 0 curg 6 gp 6 [0.085958823 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae5c g 0 curg 6 gp 6 [0.094915150 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae30 g 0 curg 8 gp 8 [0.095942742 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae20 g 0 curg 8 gp 8 [0.105028154 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae60 g 0 curg 6 gp 6 [0.105966245 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae84 g 0 curg 6 gp 6 [0.115148491 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae30 g 0 curg 8 gp 8 [0.115941248 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae1c g 0 curg 8 gp 8 [0.125269703 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae64 g 0 curg 6 gp 6 [0.125964126 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae6c g 0 curg 6 gp 6 [0.135386165 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae40 g 0 curg 8 gp 8 [0.135951878 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae1c g 0 curg 8 gp 8 [0.145511377 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae80 g 0 curg 6 gp 6 [0.145979965 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae68 g 0 curg 6 gp 6 [0.155635756 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 23 pc 0x15ae40 g 0 curg 8 gp 8 [0.155944217 P 1] sighandler m 3 sp 0x3fa805b8d8 sig 27 pc 0x15ae20 g 0 curg 8 gp 8 [0.165747218 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae70 g 0 curg 6 gp 6 [0.165960137 P 0] sighandler m 0 sp 0x3fa80098d8 sig 27 pc 0x15ae7c g 0 curg 6 gp 6 >> begin log 0 << [0.175955556 P 1] sighandler m 3 sp 0x3fa805b0c8 sig 27 pc 0x16c34 g 0 curg 8 gp 0 [0.176103224 P 1] sighandler m 3 sp 0x3fa805b0c8 sig 11 pc 0x80 g 0 curg 8 gp 0 [0.176851689 P 0] sighandler m 0 sp 0x3fa80098d8 sig 23 pc 0x15ae6c g 0 curg 6 gp 6 [0.184034501 P 1] sighandler m 3 sp 0x3fa805aa48 sig 11 pc 0x78ed0 g 0 curg 8 gp 0 [0.186848275 P 1] sighandler m 3 sp 0x3fa805a378 sig 27 pc 0x86064 g 0 curg 8 gp 0 ERROR: exit status 2 ```
rhysh commented 2 years ago

On linux/amd64, adding strace -f -ttt -e rt_sigaction shows lines that set sa_mask=~[], which I take to mean "all bits set".

On linux/riscv64, the strace output shows sa_mask=[], which I take to mean "all bits cleared". When I modify runtime.setsig to set the sa_restorer field, I see it reflected in strace's view of the sa_mask field.

It looks from this like the definition of runtime.sigactiont is not correct on linux/riscv64, resulting in no signals at all being blocked when handling a signal.

https://github.com/golang/go/blob/f598e2962d3a358b59faa68471b6ed378fc68806/src/runtime/defs_linux_riscv64.go#L124-L129

cherrymui commented 2 years ago

Good finding. From the kernel C header

struct sigaction {
        __sighandler_t sa_handler;
        unsigned long sa_flags;
#ifdef SA_RESTORER
        __sigrestore_t sa_restorer;
#endif
        sigset_t sa_mask;               /* mask last for extensibility */
};

So it does look wrong if SA_RESTORER is not defined. And it does look like not defined on riscv64.

gopherbot commented 2 years ago

Change https://golang.org/cl/367635 mentions this issue: runtime: fix riscv64 sigaction mask field offset