golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.66k stars 17.62k forks source link

runtime: non-empty mark queue after concurrent mark #69803

Open neild opened 2 weeks ago

neild commented 2 weeks ago

Test failure on https://go.dev/cl/617376/2, which I don't think was responsible for the failure. Only builder to fail was gotip-linux-amd64-aliastypeparams.

runtime: full=0xc0000d38000006 next=129 jobs=128 nDataRoots=1 nBSSRoots=1 nSpanRoots=16 nStackRoots=108
panic: non-empty mark queue after concurrent mark
fatal error: panic on system stack

runtime stack:
runtime.throw({0x6dcdac?, 0x73cf70?})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1074 +0x48 fp=0x7fdc4b7fdd30 sp=0x7fdc4b7fdd00 pc=0x472a48
panic({0x67f660?, 0x73cf70?})
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:751 +0x33b fp=0x7fdc4b7fdde0 sp=0x7fdc4b7fdd30 pc=0x47295b
runtime.gcMark(0x47b2ed?)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgc.go:1531 +0x3ec fp=0x7fdc4b7fde58 sp=0x7fdc4b7fdde0 pc=0x41b78c
runtime.gcMarkTermination.func1()
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mgc.go:980 +0x17 fp=0x7fdc4b7fde70 sp=0x7fdc4b7fde58 pc=0x41ab77
runtime.systemstack(0x800000)
    /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_amd64.s:514 +0x4a fp=0x7fdc4b7fde80 sp=0x7fdc4b7fde70 pc=0x47792a

Full log at: https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8734717093063296065/+/u/step/11/log/2?format=raw

gabyhelp commented 2 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

mknyszek commented 2 weeks ago

This suggests there was an out-standing GC mark buf after we'd already passed through mark termination, meaning the mark termination algorithm failed to catch something. It could also mean that GC work was generated during mark termination in a way that we missed.

However, this should already be caught here: https://cs.opensource.google/go/go/+/master:src/runtime/mgc.go;l=900;drc=123594d3863b0a4b9094a569957d1bd94ebe7512

We do already know that mark termination is buggy (we very rarely miss work, see #27993). But it should never reach this back-up check and I don't see how that would be possible. The mark buf pointer looks almost legit, but the 0x6 in the bottom bits is incredibly fishy which makes me think that some kind of memory corruption occurred.