golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.12k stars 17.68k forks source link

runtime: segfault during conservative scan of asynchronously preempted goroutine #39499

Open jamesl33 opened 4 years ago

jamesl33 commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.14.1 linux/amd64

Does this issue reproduce with the latest release?

Yes (but not consistently) - We have reproductions up to 1.14.3 (and have just updated to 1.14.4 but no tests have been run as of yet).

What operating system and processor architecture are you using?

CentOS 7 amd64 - E5-2630 v2 (24 vCPU)

What issue are we seeing?

From a brief look at the stacktrace and runtime it looks like we are currently seeing a segfault during the conservative scan of an asynchronously preempted goroutine. While we have only seen this issue since updating to 1.14.1 (we skipped 1.14) we do rely on a couple of libraries that make use of 'unsafe' so we wouldn't be surprised if this was due to the misuse of 'unsafe' rather than an issue with the runtime itself.

I've included the full stacktrace below along with a snippet from the same backtrace displaying what I've described above. Any help debugging this issue would be greatly appreciated whether that be tips on which GODEBUG settings to use so that we can get more information about why this is happening or some steps we could take to debug this issue/to provide you with extra information.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f2f9047b8ef pc=0x42f616]

runtime stack:
runtime.throw(0xbfd246, 0x2a)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/panic.go:1114 +0x72
runtime.sigpanic()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/signal_unix.go:679 +0x46a
runtime.(*mspan).isFree(...)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mbitmap.go:255
runtime.scanConservative(0xc002b9fbd8, 0x178, 0x0, 0xc00004a698, 0x7f2f65cb3348)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:1368 +0xf6
runtime.scanframeworker(0x7f2f65cb3238, 0x7f2f65cb3348, 0xc00004a698)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:875 +0x29d
runtime.scanstack.func1(0x7f2f65cb3238, 0x0, 0x13c7920)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:736 +0x3d
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc0006f1500, 0x0, 0x0, 0x7fffffff, 0x7f2f65cb3330, 0x0, 0x0, ...)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/traceback.go:334 +0x110e
runtime.scanstack(0xc0006f1500, 0xc00004a698)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:739 +0x15e
runtime.markroot.func1()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:226 +0xbf
runtime.markroot(0xc00004a698, 0x153)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:199 +0x2f3
runtime.gcDrainN(0xc00004a698, 0x10000, 0x10000)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:1119 +0xff
runtime.gcAssistAlloc1(0xc000c84d80, 0x10000)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:531 +0xf3
runtime.gcAssistAlloc.func1()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:442 +0x33
runtime.systemstack(0x0)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/proc.go:1041

stack_trace.txt

ALTree commented 4 years ago

cc @aclements @mknyszek

randall77 commented 4 years ago

This does look like corruption of internal runtime data structures.

aclements commented 4 years ago

You mentioned possible misuse of unsafe. Could you compile with -gcflags=all=-d=checkptr (note that this is implied if you're already using -race or -msan)?

Can you reproduce this with GOTRACEBACK=system or GOTRACEBACK=crash (the latter is even more verbose, but will send around lots of signals when it crashes, which can in principle make have bad effects)? These will include SP values in the traceback, which should let us match up exactly which call frame is being scanned and where it's stopped.

jamesl33 commented 4 years ago

Thank you for the information. I've got a build running with the requested gcflags and have got the GOTRACEBACK environment variable set to crash. I'll update the issue once I've been able to reproduce the segfault.

jamesl33 commented 4 years ago

We've been able to reproduce this issue on 1.14.4 with the gcflags set as requested, however, our Jenkins environment didn't have the GOTRACEBACK variable set as we'd intended (we run multiple Jenkins instances and the reproduction is not running on our own). I've re-setup the environment on the performance machine and will continue running until I can provide an extended stack trace and hopefully a core dump.

For the time being I've attached the complete stack traces for reproductions that we have so far. stack_trace-1.txt is noteworthy because we appear fail in a different location in the runtime.

stack_trace-1.txt, stack_trace-2.txt, stack_trace-3.txt

jamesl33 commented 4 years ago

I've got a reproduction of the segfault with the requested gcflags and the GOTRACEBACK environment variable set to include runtime created goroutines, however, it does appear that, as in stack_trace-1.txt we have failed in a different location than in the other stack traces. If there's anything else you require to debug the issue please let me know and I'll do my best to provide it.

I'll leave the existing environment setup and update the issue whenever we come across any more reproductions. It's also worth noting that we have since downgraded a branch to Go 1.13.12 (and continued running testing) and we haven't yet encountered this issue.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f940c242512 pc=0x42fa4a]

runtime stack:
runtime.throw(0xc26236, 0x2a)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/panic.go:1116 +0x72 fp=0x7f93d37fbe68 sp=0x7f93d37fbe38 pc=0x443502
runtime.sigpanic()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/signal_unix.go:679 +0x46a fp=0x7f93d37fbe98 sp=0x7f93d37fbe68 pc=0x459cfa
runtime.markBits.setMarked(...)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mbitmap.go:295
runtime.greyobject(0xc0011a5100, 0xc0031efcd8, 0x1baf0, 0x7f93e00b5db8, 0xc000031698, 0x99b10)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1439 +0x22a fp=0x7f93d37fbec8 sp=0x7f93d37fbe98 pc=0x42fa4a
runtime.scanConservative(0xc0031efcd8, 0x20068, 0x0, 0xc000031698, 0x7f93d37fc350)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1374 +0x15d fp=0x7f93d37fbf18 sp=0x7f93d37fbec8 pc=0x42f69d
runtime.scanframeworker(0x7f93d37fc240, 0x7f93d37fc350, 0xc000031698)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:875 +0x29d fp=0x7f93d37fbfa8 sp=0x7f93d37fbf18 pc=0x42e99d
runtime.scanstack.func1(0x7f93d37fc240, 0x0, 0x1413ec0)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:736 +0x3d fp=0x7f93d37fbfd0 sp=0x7f93d37fbfa8 pc=0x46eebd
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc00011ad80, 0x0, 0x0, 0x7fffffff, 0x7f93d37fc338, 0x0, 0x0, ...)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/traceback.go:334 +0x110e fp=0x7f93d37fc2a8 sp=0x7f93d37fbfd0 pc=0x467d7e
runtime.scanstack(0xc00011ad80, 0xc000031698)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:739 +0x15e fp=0x7f93d37fc4b0 sp=0x7f93d37fc2a8 pc=0x42e07e
runtime.markroot.func1()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:226 +0xbf fp=0x7f93d37fc500 sp=0x7f93d37fc4b0 pc=0x46ed5f
runtime.markroot(0xc000031698, 0xc3)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:199 +0x2f3 fp=0x7f93d37fc580 sp=0x7f93d37fc500 pc=0x42d113
runtime.gcDrain(0xc000031698, 0x3)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:999 +0x107 fp=0x7f93d37fc5d8 sp=0x7f93d37fc580 pc=0x42ead7
runtime.gcBgMarkWorker.func2()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgc.go:1940 +0x80 fp=0x7f93d37fc618 sp=0x7f93d37fc5d8 pc=0x46eb70
runtime.systemstack(0x0)
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x7f93d37fc620 sp=0x7f93d37fc618 pc=0x471816
runtime.mstart()
    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/proc.go:1041 fp=0x7f93d37fc628 sp=0x7f93d37fc620 pc=0x4482e0

stack_trace-4.txt

jamesl33 commented 4 years ago

Just giving an update, we've just updated to Go 1.15 and will continue running our performance testing to see if we can reproduce the issue further (the performance cluster we were reproducing on recently had some issues/downtime). With that said, is there anything else we can provide to help debug the issue?

YuhangMa1117 commented 3 months ago
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f024c1505ab pc=0x4226d4]

runtime stack:
runtime.throw({0x11d71fc?, 0x1c97f9670?})
    /usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7f01c97f9640 sp=0x7f01c97f9610 pc=0x439b5d
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:819 +0x369 fp=0x7f01c97f9690 sp=0x7f01c97f9640 pc=0x450409
runtime.(*mspan).isFree(...)
    /usr/local/go/src/runtime/mbitmap.go:231
runtime.scanConservative(0xc00055c960, 0x1088, 0x0, 0x46dea1?, 0x7f01c97f9b68)
    /usr/local/go/src/runtime/mgcmark.go:1432 +0x134 fp=0x7f01c97f96e8 sp=0x7f01c97f9690 pc=0x4226d4
runtime.scanframeworker(0x7f01c97f9a80, 0x7f01c97f9b68, 0x1?)
    /usr/local/go/src/runtime/mgcmark.go:941 +0x148 fp=0x7f01c97f9750 sp=0x7f01c97f96e8 pc=0x421968
runtime.scanstack.func1(0x1ad08e0?, 0x1b47760?)
    /usr/local/go/src/runtime/mgcmark.go:801 +0x25 fp=0x7f01c97f9778 sp=0x7f01c97f9750 pc=0x4217e5
runtime.gentraceback(0x20?, 0x1366128?, 0x0?, 0x7f01c97f9b68?, 0x0, 0x0, 0x7fffffff, 0x7f01c97f9cc8, 0x0?, 0x0)
    /usr/local/go/src/runtime/traceback.go:334 +0xd0d fp=0x7f01c97f9ae8 sp=0x7f01c97f9778 pc=0x4602ed
runtime.scanstack(0xc0026489c0, 0xc000055740)
    /usr/local/go/src/runtime/mgcmark.go:804 +0x1da fp=0x7f01c97f9cf0 sp=0x7f01c97f9ae8 pc=0x42125a
runtime.markroot.func1()
    /usr/local/go/src/runtime/mgcmark.go:240 +0xc5 fp=0x7f01c97f9d40 sp=0x7f01c97f9cf0 pc=0x420085
runtime.markroot(0xc000055740, 0x11b6, 0x1)
    /usr/local/go/src/runtime/mgcmark.go:213 +0x1a5 fp=0x7f01c97f9de0 sp=0x7f01c97f9d40 pc=0x41fd25
runtime.gcDrain(0xc000055740, 0x2)
    /usr/local/go/src/runtime/mgcmark.go:1069 +0x39f fp=0x7f01c97f9e40 sp=0x7f01c97f9de0 pc=0x421dff
runtime.gcBgMarkWorker.func2()
    /usr/local/go/src/runtime/mgc.go:1323 +0x154 fp=0x7f01c97f9e90 sp=0x7f01c97f9e40 pc=0x41e3d4
runtime.systemstack()
    /usr/local/go/src/runtime/asm_amd64.s:492 +0x49 fp=0x7f01c97f9e98 sp=0x7f01c97f9e90 pc=0x46bce9

@mknyszek we caught a panic whose stack trace similar to the above, the stack may be helpful. The go version is 1.19

mknyszek commented 3 months ago

@YuhangMa1117 Sorry, but Go 1.19 is no longer supported (https://go.dev/doc/devel/release#policy). Please try a newer version of Go -- if it still happens, feel free to reply back or file a new issue. Thanks.

jonegenh commented 2 days ago
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7fce1e91a1d8 pc=0x4264b4]

runtime stack:
runtime.throw({0xab7e9b, 0x43de90})
        /home/admin/.cache/tools/go-1.17.8/src/runtime/panic.go:1198 +0x71 fp=0x7fcdecffc528 sp=0x7fcdecffc4f8 pc=0x43ccf1
runtime.sigpanic()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/signal_unix.go:719 +0x396 fp=0x7fcdecffc578 sp=0x7fcdecffc528 pc=0x452e36
runtime.(*mspan).isFree(...)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mbitmap.go:226
runtime.scanConservative(0xc009189210, 0x178, 0x0, 0xdd9648, 0x7fcdecffca50)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:1374 +0x134 fp=0x7fcdecffc5d0 sp=0x7fcdecffc578 pc=0x4264b4
runtime.scanframeworker(0x7fcdecffc970, 0x7fcdecffca50, 0x1124400)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:886 +0x158 fp=0x7fcdecffc640 sp=0x7fcdecffc5d0 pc=0x425778
runtime.scanstack.func1(0xdd9648, 0x1124400)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:745 +0x25 fp=0x7fcdecffc668 sp=0x7fcdecffc640 pc=0x4255e5
runtime.gentraceback(0x40, 0xb598d1, 0x7fcdf45d32a0, 0x7fcdecffca50, 0x0, 0x0, 0x7fffffff, 0x7fcdecffcbb0, 0x1184308, 0x0)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/traceback.go:350 +0xac3 fp=0x7fcdecffc9d8 sp=0x7fcdecffc668 pc=0x461a43
runtime.scanstack(0xc007257520, 0xc000032698)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:748 +0x197 fp=0x7fcdecffcbd8 sp=0x7fcdecffc9d8 pc=0x425097
runtime.markroot.func1()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:232 +0xb1 fp=0x7fcdecffcc20 sp=0x7fcdecffcbd8 pc=0x423fd1
runtime.markroot(0xc000032698, 0x2a6)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:205 +0x170 fp=0x7fcdecffcca0 sp=0x7fcdecffcc20 pc=0x423d90
runtime.gcDrain(0xc000032698, 0x3)
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgcmark.go:1013 +0x379 fp=0x7fcdecffccf8 sp=0x7fcdecffcca0 pc=0x425bf9
runtime.gcBgMarkWorker.func2()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgc.go:1269 +0xa5 fp=0x7fcdecffcd48 sp=0x7fcdecffccf8 pc=0x422c45
runtime.systemstack()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/asm_amd64.s:383 +0x49 fp=0x7fcdecffcd50 sp=0x7fcdecffcd48 pc=0x46d1c9

goroutine 11 [GC worker (idle)]:
runtime.systemstack_switch()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/asm_amd64.s:350 fp=0xc000050f60 sp=0xc000050f58 pc=0x46d160
runtime.gcBgMarkWorker()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgc.go:1256 +0x1b1 fp=0xc000050fe0 sp=0xc000050f60 pc=0x4228f1
runtime.goexit()
        /home/admin/.cache/tools/go-1.17.8/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc000050fe8 sp=0xc000050fe0 pc=0x46f381
created by runtime.gcBgMarkStartWorkers
        /home/admin/.cache/tools/go-1.17.8/src/runtime/mgc.go:1124 +0x25
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f82cb5f7b0c pc=0x424654]
runtime stack:
runtime.throw({0x9dccf8, 0x43a4be})
/home/admin/.cache/tools/go-1.17.5/src/runtime/panic.go:1198 +0x71
runtime.sigpanic()
/home/admin/.cache/tools/go-1.17.5/src/runtime/signal_unix.go:719 +0x396
runtime.(*mspan).isFree(...)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mbitmap.go:226
runtime.scanConservative(0xc004a53a68, 0x178, 0x0, 0xca7fe8, 0x7f82a113ea50)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:1374 +0x134
runtime.scanframeworker(0x7f82a113e970, 0x7f82a113ea50, 0xfcf2e0)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:886 +0x158
runtime.scanstack.func1(0xca7fe8, 0xfcf2e0)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:745 +0x25
runtime.gentraceback(0x10, 0x9eca18, 0x0, 0x7f82a113ea50, 0x0, 0x0, 0x7fffffff, 0x7f82a113ebb0, 0x7fffffff, 0x0)
/home/admin/.cache/tools/go-1.17.5/src/runtime/traceback.go:350 +0xac3
runtime.scanstack(0xc000267860, 0xc00002b698)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:748 +0x197
runtime.markroot.func1()
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:232 +0xb1
runtime.markroot(0xc00002b698, 0x59)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:205 +0x170
runtime.gcDrain(0xc00002b698, 0x3)
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgcmark.go:1013 +0x379
runtime.gcBgMarkWorker.func2()
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgc.go:1269 +0xa5
runtime.systemstack()
/home/admin/.cache/tools/go-1.17.5/src/runtime/asm_amd64.s:383 +0x49
goroutine 52 [GC worker (idle)]:
runtime.systemstack_switch()
/home/admin/.cache/tools/go-1.17.5/src/runtime/asm_amd64.s:350 fp=0xc0001ebf60 sp=0xc0001ebf58 pc=0x4669e0
runtime.gcBgMarkWorker()
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgc.go:1256 +0x1b1 fp=0xc0001ebfe0 sp=0xc0001ebf60 pc=0x420a91
runtime.goexit()
/home/admin/.cache/tools/go-1.17.5/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0001ebfe8 sp=0xc0001ebfe0 pc=0x468c01
created by runtime.gcBgMarkStartWorkers
/home/admin/.cache/tools/go-1.17.5/src/runtime/mgc.go:1124 +0x25

@mknyszek We caught a similar panic in 1.17.5/1.17.8. We are using cgo features, and we have shared memory calls in cgo such as mmap, etc., do these cause the final panic?