golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.87k stars 17.52k forks source link

cmd/compile: invalid pointer found on stack when compiled with -race #63657

Closed fischerman closed 10 months ago

fischerman commented 10 months ago

What version of Go are you using (go version)?

$ go version
go version go1.21.1 linux/amd64

Does this issue reproduce with the latest release?

Yes, as of this writing "go1.21.3".

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/personal/.cache/go-build'
GOENV='/home/personal/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/personal/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/personal/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.1'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/personal/repo/invalid-stack-pointer/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1662630679=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Run go test -race . in this go module.

I couldn't reproduce it without dependencies, but the code is quiet small. I'm only "using" Ginkgo and Gomega. Of another dependency I'm just using a type which only ever set to nil. I've added some comments.

The error only occurs in Go 1.21. Go 1.20 works fine. Also the -race flag is required.

When I remove dead code the panic doesn't occur.

What did you expect to see?

Ginkgo test results.

``` Running Suite: Stackit Suite - /work ==================================== Random Seed: 1697889786 Will run 1 of 1 specs ------------------------------ • [FAILED] [0.001 seconds] f [It] no 'invalid pointer found on stack' please /work/suite_test.go:17 [FAILED] Unexpected error: <*errors.errorString | 0xc0000640a0>: not even related to the call to f { s: "not even related to the call to f", } occurred In [It] at: /work/suite_test.go:19 @ 10/21/23 12:03:06.601 ------------------------------ Summarizing 1 Failure: [FAIL] f [It] no 'invalid pointer found on stack' please /work/suite_test.go:19 Ran 1 of 1 Specs in 0.002 seconds FAIL! -- 0 Passed | 1 Failed | 0 Pending | 0 Skipped --- FAIL: TestStackit (0.00s) FAIL FAIL github.com/fischerman/invalid-stack-pointer 0.015s FAIL ```

What did you see instead?

fatal error: invalid pointer found on stack
``` Running Suite: Stackit Suite - /work ==================================== Random Seed: 1697889305 Will run 1 of 1 specs runtime: bad pointer in frame github.com/fischerman/invalid-stack-pointer.glob..func1.1 at 0xc0000bdee0: 0x10 fatal error: invalid pointer found on stack runtime stack: runtime.throw({0xa16d05?, 0xd35080?}) /usr/local/go/src/runtime/panic.go:1077 +0x5c fp=0x7faaae6e78b8 sp=0x7faaae6e7888 pc=0x47019c runtime.adjustpointers(0x7faaae6e7b30?, 0x7faaae6e7978, 0x498605?, {0x7faaae6e7b30?, 0x0?}) /usr/local/go/src/runtime/stack.go:627 +0x1ad fp=0x7faaae6e7918 sp=0x7faaae6e78b8 pc=0x48b24d runtime.adjustframe(0x7faaae6e7b30, 0x7faaae6e7a10) /usr/local/go/src/runtime/stack.go:684 +0xdb fp=0x7faaae6e79a8 sp=0x7faaae6e7918 pc=0x48b37b runtime.copystack(0xc0001884e0, 0x800000002?) /usr/local/go/src/runtime/stack.go:935 +0x2c5 fp=0x7faaae6e7ca0 sp=0x7faaae6e79a8 pc=0x48bb25 runtime.newstack() /usr/local/go/src/runtime/stack.go:1116 +0x47f fp=0x7faaae6e7e50 sp=0x7faaae6e7ca0 pc=0x48c0df traceback: unexpected SPWRITE function runtime.morestack runtime.morestack() /usr/local/go/src/runtime/asm_amd64.s:593 +0x8f fp=0x7faaae6e7e58 sp=0x7faaae6e7e50 pc=0x4a5fef goroutine 37 [copystack]: fmt.(*pp).handleMethods(0xc0001b61a0, 0x73) /usr/local/go/src/fmt/print.go:621 +0x6f0 fp=0xc0000bd810 sp=0xc0000bd808 pc=0x541f30 fmt.(*pp).printArg(0xc0001b61a0, {0x9fe980?, 0x9936a0}, 0x73) /usr/local/go/src/fmt/print.go:756 +0xccf fp=0xc0000bd8f0 sp=0xc0000bd810 pc=0x542e8f fmt.(*pp).doPrintf(0xc0001b61a0, {0xa0bda2, 0x9}, {0xc0000bdb68?, 0x2, 0x2}) /usr/local/go/src/fmt/print.go:1077 +0x590 fp=0xc0000bda38 sp=0xc0000bd8f0 pc=0x547910 fmt.Sprintf({0xa0bda2, 0x9}, {0xc000185b68, 0x2, 0x2}) /usr/local/go/src/fmt/print.go:239 +0x5d fp=0xc0000bda90 sp=0xc0000bda38 pc=0x53da7d github.com/onsi/gomega/format.formatType({0x9936a0?, 0xc00019c090?, 0xc00019c090?}) /go/pkg/mod/github.com/onsi/gomega@v1.27.10/format/format.go:299 +0x545 fp=0xc0000bdbc8 sp=0xc0000bda90 pc=0x939c05 github.com/onsi/gomega/format.Object({0x9936a0, 0xc00019c090}, 0xc00019c090?) /go/pkg/mod/github.com/onsi/gomega@v1.27.10/format/format.go:265 +0x252 fp=0xc0000bdd10 sp=0xc0000bdbc8 pc=0x9392b2 github.com/onsi/gomega/matchers.(*HaveOccurredMatcher).NegatedFailureMessage(0x7faaf6b0c228?, {0x9936a0, 0xc00019c090}) /go/pkg/mod/github.com/onsi/gomega@v1.27.10/matchers/have_occurred_matcher.go:34 +0x3a fp=0xc0000bdd78 sp=0xc0000bdd10 pc=0x9537fa github.com/onsi/gomega/internal.(*Assertion).match(0xc0001d2040, {0xaedf78, 0xdc1b80}, 0x0, {0x0, 0x0, 0x0}) /go/pkg/mod/github.com/onsi/gomega@v1.27.10/internal/assertion.go:103 +0x1d6 fp=0xc0000bde48 sp=0xc0000bdd78 pc=0x93d3f6 github.com/onsi/gomega/internal.(*Assertion).NotTo(0xc0001d2040, {0xaedf78, 0xdc1b80}, {0x0, 0x0, 0x0}) /go/pkg/mod/github.com/onsi/gomega@v1.27.10/internal/assertion.go:74 +0x11e fp=0xc0000bdea8 sp=0xc0000bde48 pc=0x93d01e github.com/fischerman/invalid-stack-pointer.glob..func1.1() /work/suite_test.go:19 +0xc6 fp=0xc0000bdf00 sp=0xc0000bdea8 pc=0x954446 github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x0, 0x0}) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/node.go:463 +0x2f fp=0xc0000bdf20 sp=0xc0000bdf00 pc=0x9168cf github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3() /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/suite.go:889 +0x106 fp=0xc0000bdfe0 sp=0xc0000bdf20 pc=0x931046 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000bdfe8 sp=0xc0000bdfe0 pc=0x4a7e81 created by github.com/onsi/ginkgo/v2/internal.(*Suite).runNode in goroutine 19 /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/suite.go:876 +0x1345 goroutine 1 [chan receive]: runtime.gopark(0x0?, 0x0?, 0x18?, 0xc6?, 0x18?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0002316a8 sp=0xc000231688 pc=0x47308e runtime.chanrecv(0xc00025e380, 0xc00023178f, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x385 fp=0xc000231720 sp=0xc0002316a8 pc=0x43e4a5 runtime.chanrecv1(0xa03500?, 0x981b80?) /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc000231748 sp=0xc000231720 pc=0x43e112 testing.(*T).Run(0xc00029a000, {0xa0c57b, 0xb}, 0xa47cc0) /usr/local/go/src/testing/testing.go:1649 +0x856 fp=0xc000231868 sp=0xc000231748 pc=0x586f16 testing.runTests.func1(0x0?) /usr/local/go/src/testing/testing.go:2054 +0x85 fp=0xc0002318c0 sp=0xc000231868 pc=0x58aa45 testing.tRunner(0xc00029a000, 0xc000231b08) /usr/local/go/src/testing/testing.go:1595 +0x239 fp=0xc0002319d8 sp=0xc0002318c0 pc=0x585699 testing.runTests(0xc0000a99a0?, {0xd6b840, 0x1, 0x1}, {0x1c?, 0x4a9539?, 0xd92340?}) /usr/local/go/src/testing/testing.go:2052 +0x897 fp=0xc000231b38 sp=0xc0002319d8 pc=0x58a8b7 testing.(*M).Run(0xc0000a99a0) /usr/local/go/src/testing/testing.go:1925 +0xb58 fp=0xc000231eb8 sp=0xc000231b38 pc=0x5880d8 main.main() _testmain.go:47 +0x2be fp=0xc000231f40 sp=0xc000231eb8 pc=0x95489e runtime.main() /usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc000231fe0 sp=0xc000231f40 pc=0x472c1b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000231fe8 sp=0xc000231fe0 pc=0x4a7e81 goroutine 2 [force gc (idle)]: runtime.gopark(0xd2d210?, 0xd930e0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004e7a8 sp=0xc00004e788 pc=0x47308e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00004e7e0 sp=0xc00004e7a8 pc=0x472ef3 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004e7e8 sp=0xc00004e7e0 pc=0x4a7e81 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:310 +0x1a goroutine 3 [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005ef78 sp=0xc00005ef58 pc=0x47308e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00005efc8 sp=0xc00005ef78 pc=0x45d234 runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00005efe0 sp=0xc00005efc8 pc=0x4523e5 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x4a7e81 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:200 +0x66 goroutine 4 [GC scavenge wait]: runtime.gopark(0xc00002a070?, 0xae7950?, 0x1?, 0x0?, 0xc0000071e0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000064f70 sp=0xc000064f50 pc=0x47308e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:404 runtime.(*scavengerState).park(0xd92560) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000064fa0 sp=0xc000064f70 pc=0x45aae9 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc000064fc8 sp=0xc000064fa0 pc=0x45b05c runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc000064fe0 sp=0xc000064fc8 pc=0x452385 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x4a7e81 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:201 +0xa5 goroutine 18 [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000184e28 sp=0xc000184e08 pc=0x47308e runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x13b fp=0xc000184fe0 sp=0xc000184e28 pc=0x45145b runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000184fe8 sp=0xc000184fe0 pc=0x4a7e81 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:163 +0x3d goroutine 19 [select]: runtime.gopark(0xc0001db7d0?, 0x5?, 0xe5?, 0x9e?, 0xc0001db2c6?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc0001dad88 sp=0xc0001dad68 pc=0x47308e runtime.selectgo(0xc0001db7d0, 0xc0001db2bc, 0xd92340?, 0x0, 0x946261?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x84b fp=0xc0001daed8 sp=0xc0001dad88 pc=0x484a0b github.com/onsi/ginkgo/v2/internal.(*Suite).runNode(_, {0x2, 0x4, {0xa1d7b7, 0x2a}, 0xc0000a1d70, {{0xb71b31, 0x13}, 0x11, {0x0, ...}, ...}, ...}, ...) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/suite.go:911 +0x182f fp=0xc0001dee08 sp=0xc0001daed8 pc=0x92f0af github.com/onsi/ginkgo/v2/internal.(*group).attemptSpec(0xc0001e2bf8, 0x1, {{0xc000198240?, 0xc0001d2000?, 0x1?}, 0x0?}) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/group.go:199 +0x1125 fp=0xc0001e0e78 sp=0xc0001dee08 pc=0x90ba45 github.com/onsi/ginkgo/v2/internal.(*group).run(0xc0001e2bf8, {0xc00019a060, 0x1, 0x1}) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/group.go:349 +0x1228 fp=0xc0001e2860 sp=0xc0001e0e78 pc=0x90f428 github.com/onsi/ginkgo/v2/internal.(*Suite).runSpecs(0xc000262a80, {0xa0d5b1, 0xd}, {0xdc1b80, 0x0, 0x0}, {0xc00001403b, 0x5}, 0x0, {0xc00019a040, ...}) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/suite.go:489 +0x1167 fp=0xc0001e3638 sp=0xc0001e2860 pc=0x927587 github.com/onsi/ginkgo/v2/internal.(*Suite).Run(_, {_, _}, {_, _, _}, {_, _}, _, {0xaf0b10, ...}, ...) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/suite.go:130 +0x5f8 fp=0xc0001e3800 sp=0xc0001e3638 pc=0x921c98 github.com/onsi/ginkgo/v2.RunSpecs({0xaeb540, 0xc00029a1a0}, {0xa0d5b1, 0xd}, {0x0, 0x0, 0x0}) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/core_dsl.go:300 +0xe6b fp=0xc0001e3e48 sp=0xc0001e3800 pc=0x936b8b github.com/fischerman/invalid-stack-pointer.TestStackit(0x0?) /work/suite_test.go:13 +0x4e fp=0xc0001e3e98 sp=0xc0001e3e48 pc=0x9542ae testing.tRunner(0xc00029a1a0, 0xa47cc0) /usr/local/go/src/testing/testing.go:1595 +0x239 fp=0xc0001e3fb0 sp=0xc0001e3e98 pc=0x585699 testing.(*T).Run.func1() /usr/local/go/src/testing/testing.go:1648 +0x45 fp=0xc0001e3fe0 sp=0xc0001e3fb0 pc=0x587185 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0001e3fe8 sp=0xc0001e3fe0 pc=0x4a7e81 created by testing.(*T).Run in goroutine 1 /usr/local/go/src/testing/testing.go:1648 +0x82b goroutine 20 [select, locked to thread]: runtime.gopark(0xc000063fa8?, 0x2?, 0x0?, 0x0?, 0xc000063fa4?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000063e08 sp=0xc000063de8 pc=0x47308e runtime.selectgo(0xc000063fa8, 0xc000063fa0, 0x0?, 0x0, 0x2?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x84b fp=0xc000063f58 sp=0xc000063e08 pc=0x484a0b runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1014 +0x19f fp=0xc000063fe0 sp=0xc000063f58 pc=0x49ee1f runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x4a7e81 created by runtime.ensureSigM in goroutine 19 /usr/local/go/src/runtime/signal_unix.go:997 +0xc8 goroutine 34 [syscall]: runtime.notetsleepg(0x4aad51?, 0x4a7e81?) /usr/local/go/src/runtime/lock_futex.go:236 +0x29 fp=0xc00004efa0 sp=0xc00004ef68 pc=0x443ee9 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc00004efc0 sp=0xc00004efa0 pc=0x4a4409 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x1d fp=0xc00004efe0 sp=0xc00004efc0 pc=0x5cb19d runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004efe8 sp=0xc00004efe0 pc=0x4a7e81 created by os/signal.Notify.func1.1 in goroutine 19 /usr/local/go/src/os/signal/signal.go:151 +0x47 goroutine 35 [select]: runtime.gopark(0xc00005ff78?, 0x3?, 0x0?, 0x0?, 0xc00005ff22?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00005fd88 sp=0xc00005fd68 pc=0x47308e runtime.selectgo(0xc00005ff78, 0xc00005ff1c, 0xc00005ff28?, 0x0, 0x3?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x84b fp=0xc00005fed8 sp=0xc00005fd88 pc=0x484a0b github.com/onsi/ginkgo/v2/internal/interrupt_handler.(*InterruptHandler).registerForInterrupts.func2(0x0) /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/interrupt_handler/interrupt_handler.go:131 +0x125 fp=0xc00005ffb8 sp=0xc00005fed8 pc=0x8fcd85 github.com/onsi/ginkgo/v2/internal/interrupt_handler.(*InterruptHandler).registerForInterrupts.func3() /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/interrupt_handler/interrupt_handler.go:158 +0x42 fp=0xc00005ffe0 sp=0xc00005ffb8 pc=0x8fcc22 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005ffe8 sp=0xc00005ffe0 pc=0x4a7e81 created by github.com/onsi/ginkgo/v2/internal/interrupt_handler.(*InterruptHandler).registerForInterrupts in goroutine 19 /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/interrupt_handler/interrupt_handler.go:128 +0x2bd goroutine 36 [select]: runtime.gopark(0xc000065fb0?, 0x2?, 0xff?, 0xff?, 0xc000065f7c?) /usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000065df0 sp=0xc000065dd0 pc=0x47308e runtime.selectgo(0xc000065fb0, 0xc000065f78, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x84b fp=0xc000065f40 sp=0xc000065df0 pc=0x484a0b github.com/onsi/ginkgo/v2/internal.RegisterForProgressSignal.func1() /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/progress_report.go:32 +0xc7 fp=0xc000065fe0 sp=0xc000065f40 pc=0x91d067 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x4a7e81 created by github.com/onsi/ginkgo/v2/internal.RegisterForProgressSignal in goroutine 19 /go/pkg/mod/github.com/onsi/ginkgo/v2@v2.13.0/internal/progress_report.go:30 +0x189 FAIL github.com/fischerman/invalid-stack-pointer 0.025s FAIL ```
mauri870 commented 10 months ago

Seems to only affect 1.21.x, I was unable to reproduce it in 1.20 or tip. Reproduced on both darwin/arm64 and linux/amd64

This message in the stacktrace caught my attention: traceback: unexpected SPWRITE function runtime.morestack

mauri870 commented 10 months ago

cc @golang/compiler

cherrymui commented 10 months ago

@mauri870 CL https://go.dev/cl/531815 fixes the "unexpected SPWRITE" message. But that is just a message, unrelated to the original bad pointer bug.

mauri870 commented 10 months ago

That sounded like a red herring, thanks for clarifying it.

cuonglm commented 10 months ago

The program starts failing since https://go-review.googlesource.com/c/go/+/270940, then "fixed" after https://go-review.googlesource.com/c/go/+/517775

Kindly cc @randall77 and @mdempsky to decide what should we do.

randall77 commented 10 months ago

This looks like a bad reordering of a nil pointer check and subsequent pointer arithmetic.

  9542a0:       e8 5b fe ff ff          call   954100 <github.com/fischerman/invalid-stack-pointer.f>
  9542a5:       48 89 44 24 30          mov    %rax,0x30(%rsp)

f returns nil, result is written to 0x30(SP).

  9542aa:       48 8d 05 ef 4b 05 00    lea    0x54bef(%rip),%rax        # 9a8ea0 <type:*+0x53ea0>
  9542b1:       e8 0a 14 af ff          call   4456c0 <runtime.newobject>
  9542b6:       48 89 44 24 40          mov    %rax,0x40(%rsp)
  9542bb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  9542c0:       e8 fb 52 b5 ff          call   4a95c0 <runtime.racewrite>
  9542c5:       48 8b 4c 24 40          mov    0x40(%rsp),%rcx
  9542ca:       48 c7 41 08 21 00 00    movq   $0x21,0x8(%rcx)
  9542d1:       00 

start a write barrier to initialize new object allocated above. (In non-race mode I think this barrier is not needed, as it is writing a constant string pointer to known-zeroed memory. Alas, I think in race mode the zeroed-ness of the allocation is not detectable.)

  9542d2:       83 3d 67 2e 47 00 00    cmpl   $0x0,0x472e67(%rip)        # dc7140 <runtime.writeBarrier>
  9542d9:       74 0d                   je     9542e8 <github.com/fischerman/invalid-stack-pointer.glob..func1.1+0x68>
  9542db:       48 8b 11                mov    (%rcx),%rdx
  9542de:       66 90                   xchg   %ax,%ax
  9542e0:       e8 1b 3b b5 ff          call   4a7e00 <runtime.gcWriteBarrier1>
  9542e5:       49 89 13                mov    %rdx,(%r11)

Write barrier is done here, now to do the actual write. But first there are some other instructions! It's calculating &g.Listeners for use by a subsequent raceread call. I have no idea why these instructions appear here - they should not be before the write associated with the barrier. And in any case, they should be after the nil check below. The bad pointer is in 0x38(SP) and that's where the stack copier finds it and barfs.

  9542e8:       48 8b 54 24 30          mov    0x30(%rsp),%rdx
  9542ed:       48 8d 5a 10             lea    0x10(%rdx),%rbx
  9542f1:       48 89 5c 24 38          mov    %rbx,0x38(%rsp)

These 2 instructions are the actual write.

  9542f6:       48 8d 35 f8 43 0c 00    lea    0xc43f8(%rip),%rsi        # a186f5 <go:string.*+0xf39d>
  9542fd:       48 89 31                mov    %rsi,(%rcx)

Some uninteresting stuff follows.

  954300:       48 8d 05 21 71 19 00    lea    0x197121(%rip),%rax        # aeb428 <go:itab.*errors.errorString,error+0x8>
  954307:       e8 54 52 b5 ff          call   4a9560 <runtime.raceread>
  95430c:       48 8b 05 15 71 19 00    mov    0x197115(%rip),%rax        # aeb428 <go:itab.*errors.errorString,error+0x8>
  954313:       48 8b 5c 24 40          mov    0x40(%rsp),%rbx
  954318:       31 c9                   xor    %ecx,%ecx
  95431a:       31 ff                   xor    %edi,%edi
  95431c:       48 89 fe                mov    %rdi,%rsi
  95431f:       90                      nop
  954320:       e8 7b fa ff ff          call   953da0 <github.com/onsi/gomega.Expect>
  954325:       48 8b 48 20             mov    0x20(%rax),%rcx
  954329:       48 89 d8                mov    %rbx,%rax
  95432c:       48 8d 1d 05 9c 19 00    lea    0x199c05(%rip),%rbx        # aedf38 <go:itab.*github.com/onsi/gomega/matchers.HaveOccurredMatcher,github.com/onsi/gomega/types.GomegaMatcher>
  954333:       31 ff                   xor    %edi,%edi
  954335:       31 f6                   xor    %esi,%esi
  954337:       49 89 f0                mov    %rsi,%r8
  95433a:       48 89 ca                mov    %rcx,%rdx
  95433d:       48 8d 0d fc 27 47 00    lea    0x4727fc(%rip),%rcx        # dc6b40 <runtime.zerobase>
  954344:       ff d2                   call   *%rdx

The nil pointer check of the result of f has made it all the way down here.

  954346:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
  95434b:       84 01                   test   %al,(%rcx)

Then we actually read g.Listeners. We only compute &g.Listeners to pass to raceread, another reason this only happens in -race mode.

  95434d:       48 8b 44 24 38          mov    0x38(%rsp),%rax
  954352:       e8 09 52 b5 ff          call   4a9560 <runtime.raceread>
  954357:       48 8b 4c 24 30          mov    0x30(%rsp),%rcx
  95435c:       48 8b 59 10             mov    0x10(%rcx),%rbx
randall77 commented 10 months ago

Reminds me of #42673, which is what CL 270940 was supposed to fix, not cause. Somehow the nil check just isn't getting the priority it needs to come before the address calculation &g.Listener.

Here's a reproducer. At least, you can see the problem in assembly. (It will need a wrapper to make it into a test.)

package main

type T struct {
    a, b int
}

func f() {
    x := p()
    gb = gi != 0
    q()
    g(&x.b)
}
func g(p *int)

func q()

func p() *T

var gi int
var gb bool

When compiled (no -race required), we get this:

    0x000e 00014 (tmp2.go:11)   CALL    main.p(SB)
    0x0013 00019 (tmp2.go:11)   MOVQ    AX, main.x+8(SP)
    0x0018 00024 (tmp2.go:14)   LEAQ    8(AX), CX
    0x001c 00028 (tmp2.go:14)   MOVQ    CX, main..autotmp_2+16(SP)
    0x0021 00033 (tmp2.go:12)   CMPQ    main.gi(SB), $0
    0x0029 00041 (tmp2.go:12)   SETNE   main.gb(SB)
    0x0030 00048 (tmp2.go:13)   CALL    main.q(SB)
    0x0035 00053 (tmp2.go:14)   MOVQ    main.x+8(SP), AX
    0x003a 00058 (tmp2.go:14)   TESTB   AL, (AX)
    0x003c 00060 (tmp2.go:14)   MOVQ    main..autotmp_2+16(SP), AX
    0x0041 00065 (tmp2.go:14)   CALL    main.g(SB)

Note the LEAQ happens before the call to q and its result is spilled to the stack. The nil check doesn't happen until just before the call to g.

The trick with the gb = gi != 0 is to introduce a low-priority instruction to the scheduler. Flags-generating instructions are low-priority (we want them to issue as late as possible to minimize the result's lifetime). This causes a priority inversion, as the nil check is dependent on that low-priority instruction, so it gets delayed as well. The LEAQ is standard priority, so it gets to go first.

randall77 commented 10 months ago

@gopherbot Please open a backport issue for 1.21.

gopherbot commented 10 months ago

Backport issue(s) opened: #63743 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot commented 10 months ago

Change https://go.dev/cl/537775 mentions this issue: cmd/compile: ensure pointer arithmetic happens after the nil check

gopherbot commented 10 months ago

Change https://go.dev/cl/538595 mentions this issue: cmd/compile: handle constant pointer offsets in dead store elimination