Closed lcoffe-botify closed 1 year ago
Not sure it is valuable information, but, as can be seen in the runtime stacktrace, the source of the schedule()
call is mcall(park_m)
, which is done by gopark()
. It is always the case.
Thanks for the report. Do you have crash output with that extra logging added to checkdead?
hey @prattmic, thank you for looking into it. I have been considering if it is possible that the netpoll part in findRunnable
is just pushing readied G's to the global run queue because it couldn't find an idle P (with injectglist
), then it goes straight to stopm
, and up the stack checkdead
finds them with no active M.
Currently working on obtaining a core dump. Here is an example of output with the patched runtime
in the forEachG loop
[checkdead] find g 47 in status 1 - mcount=9 sched.nmidle=8 sched.nmidlelocked=0 sched.nmsys=1 sched.nmspinning=0
[checkdead] find g 54 in status 1 - mcount=9 sched.nmidle=8 sched.nmidlelocked=0 sched.nmsys=1 sched.nmspinning=0
[checkdead] find g 61 in status 1 - mcount=9 sched.nmidle=8 sched.nmidlelocked=0 sched.nmsys=1 sched.nmspinning=0
runtime stack (line numbers are off, patched version)
[checkdead] runtime stack
runtime.mput(0xc000bc1e40?)
#011/usr/local/go/src/runtime/proc.go:5766 +0x33
runtime.stopm()
#011/usr/local/go/src/runtime/proc.go:2332 +0x67
runtime.findRunnable()
#011/usr/local/go/src/runtime/proc.go:3007 +0xabc
runtime.schedule()
#011/usr/local/go/src/runtime/proc.go:3360 +0xb1
runtime.park_m(0xc000007a00?)
#011/usr/local/go/src/runtime/proc.go:3511 +0x12d
runtime.mcall()
#011/usr/local/go/src/runtime/asm_amd64.s:452 +0x43
listing of the goroutines - some goroutine bloat was added in the hope to trigger the issue faster - interesting to see that the three runnable G's here were in netpoll
[checkdead] goroutines
goroutine 1 [IO wait]:
goroutine 6 [select]:
goroutine 15 [select]:
goroutine 47 [runnable]:
internal/poll.runtime_pollWait(0x7f794a149868, 0x72)
#011/usr/local/go/src/runtime/netpoll.go:306 +0x89
internal/poll.(*pollDesc).wait(0xc00038b800?, 0xc0008ca000?, 0x0)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00038b800, {0xc0008ca000, 0x8000, 0x8000})
#011/usr/local/go/src/internal/poll/fd_unix.go:167 +0x299
net.(*netFD).Read(0xc00038b800, {0xc0008ca000?, 0x1060100000000?, 0x8?})
#011/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000124e0, {0xc0008ca000?, 0x4063c0?, 0xc00130d6e0?})
#011/usr/local/go/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc0003fe240, {0xc000869b60, 0x9, 0x7f79714cf108?})
#011/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0x1373580, 0xc0003fe240}, {0xc000869b60, 0x9, 0x9}, 0x9)
#011/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
#011/usr/local/go/src/io/io.go:351
golang.org/x/net/http2.readFrameHeader({0xc000869b60?, 0x9?, 0xbcd94fcc29a?}, {0x1373580?, 0xc0003fe240?})
#011/Users/luciencoffe/go/pkg/mod/golang.org/x/net@v0.8.0/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000869b20)
#011/Users/luciencoffe/go/pkg/mod/golang.org/x/net@v0.8.0/http2/frame.go:498 +0x95
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000a0b40, 0x0?)
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:1592 +0x257
created by google.golang.org/grpc/internal/transport.newHTTP2Client
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:394 +0x1e2a
goroutine 4473675 [select]:
goroutine 4473942 [select]:
goroutine 50 [select, 216 minutes]:
goroutine 49 [select]:
goroutine 43 [select, 216 minutes]:
goroutine 48 [select]:
goroutine 56 [select]:
goroutine 57 [select, 216 minutes]:
goroutine 54 [runnable]:
internal/poll.runtime_pollWait(0x7f794a149688, 0x72)
#011/usr/local/go/src/runtime/netpoll.go:306 +0x89
internal/poll.(*pollDesc).wait(0xc00038b980?, 0xc0001a0000?, 0x0)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00038b980, {0xc0001a0000, 0x8000, 0x8000})
#011/usr/local/go/src/internal/poll/fd_unix.go:167 +0x299
net.(*netFD).Read(0xc00038b980, {0xc0001a0000?, 0xc00130d8c0?, 0xc0008f1cd8?})
#011/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc000012570, {0xc0001a0000?, 0x4a919a?, 0xc0003fe780?})
#011/usr/local/go/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc0003fe780, {0xc0005d02ec, 0x1d14, 0x9?})
#011/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0x1373580, 0xc0003fe780}, {0xc0005ce000, 0x4000, 0x4000}, 0x4000)
#011/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
#011/usr/local/go/src/io/io.go:351
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000869c00)
#011/Users/luciencoffe/go/pkg/mod/golang.org/x/net@v0.8.0/http2/frame.go:506 +0x125
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000a0d80, 0x0?)
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:1592 +0x257
created by google.golang.org/grpc/internal/transport.newHTTP2Client
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:394 +0x1e2a
goroutine 55 [select]:
goroutine 63 [select]:
goroutine 61 [runnable]:
internal/poll.runtime_pollWait(0x7f794a149598, 0x72)
#011/usr/local/go/src/runtime/netpoll.go:306 +0x89
internal/poll.(*pollDesc).wait(0xc00038bb00?, 0xc0001c0000?, 0x0)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32
internal/poll.(*pollDesc).waitRead(...)
#011/usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc00038bb00, {0xc0001c0000, 0x8000, 0x8000})
#011/usr/local/go/src/internal/poll/fd_unix.go:167 +0x299
net.(*netFD).Read(0xc00038bb00, {0xc0001c0000?, 0x1060100000000?, 0x8?})
#011/usr/local/go/src/net/fd_posix.go:55 +0x29
net.(*conn).Read(0xc0000125c0, {0xc0001c0000?, 0x4063c0?, 0xc00266c540?})
#011/usr/local/go/src/net/net.go:183 +0x45
bufio.(*Reader).Read(0xc0003fecc0, {0xc000869d20, 0x9, 0x7f79714cf108?})
#011/usr/local/go/src/bufio/bufio.go:237 +0x1bb
io.ReadAtLeast({0x1373580, 0xc0003fecc0}, {0xc000869d20, 0x9, 0x9}, 0x9)
#011/usr/local/go/src/io/io.go:332 +0x9a
io.ReadFull(...)
#011/usr/local/go/src/io/io.go:351
golang.org/x/net/http2.readFrameHeader({0xc000869d20?, 0x9?, 0xbcd95094c0c?}, {0x1373580?, 0xc0003fecc0?})
#011/Users/luciencoffe/go/pkg/mod/golang.org/x/net@v0.8.0/http2/frame.go:237 +0x6e
golang.org/x/net/http2.(*Framer).ReadFrame(0xc000869ce0)
#011/Users/luciencoffe/go/pkg/mod/golang.org/x/net@v0.8.0/http2/frame.go:498 +0x95
google.golang.org/grpc/internal/transport.(*http2Client).reader(0xc0000a0fc0, 0x0?)
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:1592 +0x257
created by google.golang.org/grpc/internal/transport.newHTTP2Client
#011/Users/luciencoffe/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_client.go:394 +0x1e2a
goroutine 62 [select]:
goroutine 65 [IO wait, 216 minutes]:
goroutine 66 [select]:
goroutine 67 [select]:
goroutine 4473857 [select]:
goroutine 4473552 [chan receive]:
goroutine 4473423 [IO wait]:
goroutine 4473770 [chan receive]:
goroutine 4473611 [select]:
goroutine 4473640 [chan receive]:
goroutine 4473612 [IO wait]:
goroutine 4473830 [select]:
goroutine 4473639 [chan receive]:
goroutine 4473745 [chan receive]:
goroutine 4473561 [chan receive]:
goroutine 4473753 [select]:
goroutine 4473529 [chan receive]:
goroutine 4473694 [IO wait]:
goroutine 4473697 [select]:
goroutine 4473759 [IO wait]:
goroutine 4473906 [select]:
goroutine 4473719 [select]:
goroutine 4473892 [select]:
goroutine 4473560 [chan receive]:
goroutine 4473856 [IO wait]:
goroutine 4473784 [select]:
goroutine 4473908 [select]:
goroutine 4473819 [select]:
goroutine 4473417 [IO wait]:
goroutine 4473747 [select]:
goroutine 4473778 [select]:
goroutine 4459635 [chan receive]:
goroutine 4473689 [IO wait]:
goroutine 4473774 [chan receive]:
goroutine 4473601 [select]:
goroutine 4473806 [select]:
goroutine 4473420 [IO wait]:
goroutine 4473741 [chan receive]:
goroutine 4473699 [IO wait]:
goroutine 4473839 [select]:
goroutine 4473912 [select]:
goroutine 4473803 [IO wait]:
goroutine 4473430 [IO wait]:
goroutine 4473418 [IO wait]:
goroutine 4473837 [select]:
goroutine 4473426 [IO wait]:
goroutine 4473577 [select]:
goroutine 4473744 [chan receive]:
goroutine 4473645 [chan receive]:
goroutine 4473615 [select]:
goroutine 4473865 [select]:
goroutine 4473840 [IO wait]:
goroutine 4451270 [select]:
goroutine 4473727 [chan receive]:
goroutine 4473595 [select]:
goroutine 4473556 [chan receive]:
goroutine 4473516 [chan receive]:
goroutine 4473682 [select]:
goroutine 4473848 [IO wait]:
goroutine 4473740 [chan receive]:
goroutine 4473427 [IO wait]:
goroutine 4451268 [select]:
goroutine 4473820 [IO wait]:
goroutine 4473799 [IO wait]:
goroutine 4473583 [select]:
goroutine 4473695 [select]:
goroutine 4473754 [IO wait]:
goroutine 4473651 [select]:
goroutine 4473794 [select]:
goroutine 4473553 [chan receive]:
goroutine 4473528 [chan receive]:
goroutine 4473910 [select]:
goroutine 4473900 [select]:
goroutine 4473643 [chan receive]:
goroutine 4473847 [select]:
goroutine 4473584 [IO wait]:
goroutine 4473581 [select]:
goroutine 4473913 [IO wait]:
goroutine 4473419 [IO wait]:
goroutine 4473654 [IO wait]:
goroutine 4473428 [IO wait]:
goroutine 4473700 [select]:
goroutine 4473821 [select]:
goroutine 4473435 [IO wait]:
goroutine 4473730 [chan receive]:
goroutine 4473726 [chan receive]:
goroutine 4473532 [chan receive]:
goroutine 4473713 [select]:
goroutine 4473619 [select]:
goroutine 4473816 [IO wait]:
goroutine 4473563 [chan receive]:
goroutine 4459733 [select]:
goroutine 4473702 [select]:
goroutine 4473849 [select]:
goroutine 4473513 [chan receive]:
goroutine 4473928 [IO wait]:
goroutine 4473888 [select]:
goroutine 4473568 [IO wait]:
goroutine 4473655 [select]:
goroutine 4473672 [select]:
goroutine 4473859 [select]:
goroutine 4473668 [select]:
goroutine 4473914 [select]:
goroutine 4473680 [select]:
goroutine 4473766 [chan receive]:
goroutine 4473931 [select]:
goroutine 4473863 [select]:
goroutine 4473706 [IO wait]:
goroutine 4473933 [IO wait]:
goroutine 4473732 [chan receive]:
goroutine 4473623 [select]:
goroutine 4473567 [select]:
goroutine 4473894 [select]:
goroutine 4473531 [chan receive]:
goroutine 4473608 [IO wait]:
goroutine 4473605 [select]:
goroutine 4473768 [chan receive]:
goroutine 4473432 [IO wait]:
goroutine 4473665 [IO wait]:
goroutine 4473781 [IO wait]:
goroutine 4473554 [chan receive]:
goroutine 4473889 [IO wait]:
goroutine 4473739 [chan receive]:
goroutine 4473616 [IO wait]:
goroutine 4473517 [chan receive]:
goroutine 4473720 [IO wait]:
goroutine 4473621 [select]:
goroutine 4473755 [select]:
goroutine 4473551 [chan receive]:
goroutine 4473653 [select]:
goroutine 4473750 [IO wait]:
goroutine 4473716 [IO wait]:
goroutine 4473434 [IO wait]:
goroutine 4473562 [chan receive]:
goroutine 4473646 [IO wait]:
goroutine 4473674 [IO wait]:
goroutine 4473776 [chan receive]:
goroutine 4473571 [select]:
goroutine 4473802 [select]:
goroutine 4451151 [chan receive]:
goroutine 4473836 [IO wait]:
goroutine 4473934 [select]:
goroutine 4473826 [select]:
goroutine 4473421 [IO wait]:
goroutine 4473844 [IO wait]:
goroutine 4473692 [select]:
goroutine 4473898 [select]:
goroutine 4473617 [select]:
goroutine 4473880 [select]:
goroutine 4473555 [chan receive]:
goroutine 4451267 [IO wait]:
goroutine 4473670 [select]:
goroutine 4473723 [select]:
goroutine 4473607 [select]:
goroutine 4473909 [IO wait]:
goroutine 4473424 [IO wait]:
goroutine 4473751 [select]:
goroutine 4473780 [select]:
goroutine 4473885 [IO wait]:
goroutine 4459735 [select]:
goroutine 4473588 [IO wait]:
goroutine 4473729 [chan receive]:
goroutine 4473890 [select]:
goroutine 4473743 [chan receive]:
goroutine 4473686 [select]:
goroutine 4473728 [chan receive]:
goroutine 4473592 [IO wait]:
goroutine 4473772 [chan receive]:
goroutine 4473832 [select]:
goroutine 4473886 [select]:
goroutine 4473804 [select]:
goroutine 4473690 [select]:
goroutine 4473841 [select]:
goroutine 4473917 [chan receive]:
goroutine 4473916 [select]:
goroutine 4473569 [select]:
goroutine 4473596 [IO wait]:
goroutine 4473580 [IO wait]:
goroutine 4473735 [chan receive]:
goroutine 4473717 [select]:
goroutine 4473795 [IO wait]:
goroutine 4473773 [chan receive]:
goroutine 4473657 [select]:
goroutine 4473597 [select]:
goroutine 4473515 [chan receive]:
goroutine 4473565 [select]:
goroutine 4473852 [IO wait]:
goroutine 4473864 [IO wait]:
goroutine 4473613 [select]:
goroutine 4473845 [select]:
goroutine 4473893 [IO wait]:
goroutine 4473709 [select]:
goroutine 4473828 [select]:
goroutine 4473514 [chan receive]:
goroutine 4473660 [select]:
goroutine 4473600 [IO wait]:
goroutine 4473742 [chan receive]:
goroutine 4473604 [IO wait]:
goroutine 4473896 [select]:
goroutine 4473777 [IO wait]:
goroutine 4473940 [select]:
goroutine 4473929 [select]:
goroutine 4473817 [select]:
goroutine 4473851 [select]:
goroutine 4473591 [select]:
goroutine 4473782 [select]:
goroutine 4473431 [IO wait]:
goroutine 4473771 [chan receive]:
goroutine 4473662 [select]:
goroutine 4473731 [chan receive]:
goroutine 4473650 [IO wait]:
goroutine 4473593 [select]:
goroutine 4473603 [select]:
goroutine 4473585 [select]:
goroutine 4473823 [select]:
goroutine 4473757 [select]:
goroutine 4473884 [select]:
goroutine 4473829 [IO wait]:
goroutine 4473877 [IO wait]:
goroutine 4473416 [IO wait]:
goroutine 4473775 [chan receive]:
goroutine 4473649 [select]:
goroutine 4473787 [select]:
goroutine 4473902 [select]:
goroutine 4473519 [chan receive]:
goroutine 4473558 [chan receive]:
goroutine 4473641 [chan receive]:
goroutine 4473422 [IO wait]:
goroutine 4473861 [select]:
goroutine 4473936 [select]:
goroutine 4473589 [select]:
goroutine 4473644 [chan receive]:
goroutine 4473767 [chan receive]:
goroutine 4473557 [chan receive]:
goroutine 4473904 [select]:
goroutine 4473710 [IO wait]:
goroutine 4473677 [select]:
goroutine 4473878 [select]:
goroutine 4473905 [IO wait]:
goroutine 4473620 [IO wait]:
goroutine 4473599 [select]:
goroutine 4473539 [chan receive]:
goroutine 4473789 [select]:
goroutine 4473659 [IO wait]:
goroutine 4473810 [select]:
goroutine 4473666 [select]:
goroutine 4473901 [IO wait]:
goroutine 4473733 [chan receive]:
goroutine 4473798 [select]:
goroutine 4473573 [select]:
goroutine 4473867 [select]:
goroutine 4473579 [select]:
goroutine 4473738 [chan receive]:
goroutine 4473853 [select]:
goroutine 4473685 [IO wait]:
goroutine 4473762 [select]:
goroutine 4473737 [chan receive]:
goroutine 4473707 [select]:
goroutine 4473938 [chan receive]:
goroutine 4473642 [chan receive]:
goroutine 4473791 [IO wait]:
goroutine 4473760 [select]:
goroutine 4473749 [select]:
goroutine 4473786 [IO wait]:
goroutine 4473540 [chan receive]:
goroutine 4473825 [IO wait]:
goroutine 4473575 [select]:
goroutine 4473572 [IO wait]:
goroutine 4473587 [select]:
goroutine 4473576 [IO wait]:
goroutine 4473769 [chan receive]:
goroutine 4473746 [IO wait]:
goroutine 4473881 [IO wait]:
goroutine 4473800 [select]:
goroutine 4473792 [select]:
goroutine 4473736 [chan receive]:
goroutine 4459732 [IO wait]:
goroutine 4473843 [select]:
goroutine 4473939 [IO wait]:
goroutine 4473688 [select]:
goroutine 4473433 [IO wait]:
goroutine 4473647 [select]:
goroutine 4473425 [IO wait]:
goroutine 4473897 [IO wait]:
goroutine 4473860 [IO wait]:
goroutine 4473711 [select]:
goroutine 4473882 [select]:
goroutine 4473679 [IO wait]:
goroutine 4473796 [select]:
goroutine 4473734 [chan receive]:
goroutine 4473537 [chan receive]:
goroutine 4473809 [IO wait]:
goroutine 4473812 [select]:
goroutine 4473564 [IO wait]:
goroutine 4473669 [IO wait]:
goroutine 4473609 [select]:
goroutine 4473429 [IO wait]:
goroutine 4473520 [chan receive]:
goroutine 4473559 [chan receive]:
goroutine 4473855 [select]:
goroutine 4473721 [select]:
goroutine 4473518 [chan receive]:
I have been considering if it is possible that the netpoll part in findRunnable is just pushing readied G's to the global run queue because it couldn't find an idle P (with injectglist), then it goes straight to stopm, and up the stack checkdead finds them with no active M.
Normally I'd say it seems unlikely to lose a race against all of the Ps like this, but since this is GOMAXPROCS=1 that makes it more plausible.
Currently working on obtaining a core dump.
A core dump should provide more insight. Consider building from a tip toolchain and setting GODEBUG=dontfreezetheworld=1
, which was just added a few weeks ago. Normally the scheduler tries to stop all goroutines when throwing, which can obscure the scheduler state in a core dump. This flag will skip that behavior.
Setting GODEBUG=schedtrace=1000
will also add a scheduler state dump on throw, not dissimilar from the logging you added. (Multiple GODEBUG
options can be set, comma separated)
I don't suppose you can share a program that reproduces this behavior? Do you know if this reproduced on older versions of Go?
This happens on m3.medium AWS instances. I could not reproduce it anywhere else so far. This is a monothreaded amd64 instance
To clarify, you cannot reproduce elsewhere even when explicitly setting GOMAXPROCS=1
?
To clarify, you cannot reproduce elsewhere even when explicitly setting GOMAXPROCS=1?
I was not able to reproduce it on other instance types so far (ie forcing GOMAXPROCS=1 in a multithreaded environment). I should try again with things I learned in the interim.
I don't suppose you can share a program that reproduces this behavior? Do you know if this reproduced on older versions of Go?
I have no minimal reproducer that I can share yet. I did not try to switch to older versions yet, as anyway I couldn't go past 1.19 (would that still be useful?).
A core dump should provide more insight. Consider building from a tip toolchain and setting GODEBUG=dontfreezetheworld=1, which was just added a few weeks ago. Normally the scheduler tries to stop all goroutines when throwing, which can obscure the scheduler state in a core dump. This flag will skip that behavior.
Thank you for your suggestions. I tried to get a core dump with GODEBUG=dontfreezetheworld=1, but the process did hang. The checkdead debug line was printed, and the process just stay there. After ~40min, I concluded it won't quit, so I got a core dump with gcore, and killed it. That core dump did not help.
Still looking at the core dumps I got so far. I am trying to figure out how to share more context
After reviewing enough core dumps, I could see that the threads are consistently in a state that indicates a race on shed.lock. This issue stood out, although it is not the same race condition happening. According to the stack traces below, M1 is crashing in checkdead
, and M2 is locked on acquiring sched.lock
in startm
. Let's keep this definition of M1 and M2 for later reference. Line number are off, explained later on:
M1
0 0x000000000046f021 in runtime.raise
at /usr/local/go/src/runtime/sys_linux_amd64.s:154
1 0x000000000044f0a5 in runtime.dieFromSignal
at /usr/local/go/src/runtime/signal_unix.go:879
2 0x000000000044f646 in runtime.sigfwdgo
at /usr/local/go/src/runtime/signal_unix.go:1092
3 0x000000000044dd67 in runtime.sigtrampgo
at /usr/local/go/src/runtime/signal_unix.go:432
4 0x000000000046f326 in runtime.sigtramp
at /usr/local/go/src/runtime/sys_linux_amd64.s:354
5 0x000000000046f420 in runtime.sigreturn
at /usr/local/go/src/runtime/sys_linux_amd64.s:468
6 0x000000000044f2b8 in runtime.crash
at /usr/local/go/src/runtime/signal_unix.go:971
7 0x0000000000437851 in runtime.fatalthrow.func1
at /usr/local/go/src/runtime/panic.go:1127
8 0x00000000004377cc in runtime.fatalthrow
at /usr/local/go/src/runtime/panic.go:1120
9 0x000000000043747d in runtime.throw
at /usr/local/go/src/runtime/panic.go:1047
10 0x0000000000445dcb in runtime.checkdead.func1
at /usr/local/go/src/runtime/proc.go:5397
11 0x000000000043ac3c in runtime.forEachG
at /usr/local/go/src/runtime/proc.go:591
12 0x000000000044594f in runtime.checkdead
at /usr/local/go/src/runtime/proc.go:5376
13 0x0000000000447233 in runtime.mput
at /usr/local/go/src/runtime/proc.go:5968
14 0x000000000043e389 in runtime.stopm
at /usr/local/go/src/runtime/proc.go:2382
M2 (always waking up scavengerState)
0 0x000000000046f5c3 in runtime.futex
at /usr/local/go/src/runtime/sys_linux_amd64.s:555
1 0x0000000000433556 in runtime.futexsleep
at /usr/local/go/src/runtime/os_linux.go:69
2 0x000000000040c2d9 in runtime.lock2
at /usr/local/go/src/runtime/lock_futex.go:107
3 0x000000000043e4d4 in runtime.lockWithRank
at /usr/local/go/src/runtime/lockrank_off.go:22
4 0x000000000043e4d4 in runtime.lock
at /usr/local/go/src/runtime/lock_futex.go:48
5 0x000000000043e4d4 in runtime.startm
at /usr/local/go/src/runtime/proc.go:2428
6 0x0000000000465732 in runtime.injectglist.func1
at /usr/local/go/src/runtime/proc.go:3422
7 0x0000000000440cfc in runtime.injectglist
at /usr/local/go/src/runtime/proc.go:3436
8 0x0000000000422725 in runtime.(*scavengerState).wake
at /usr/local/go/src/runtime/mgcscavenge.go:428
9 0x0000000000446205 in runtime.sysmon
at /usr/local/go/src/runtime/proc.go:5655
10 0x000000000043cb93 in runtime.mstart1
at /usr/local/go/src/runtime/proc.go:1513
11 0x000000000043cada in runtime.mstart0
at /usr/local/go/src/runtime/proc.go:1470
12 0x000000000046b525 in runtime.mstart
at /usr/local/go/src/runtime/asm_amd64.s:395
13 0x000000000046f66f in runtime.clone
at /usr/local/go/src/runtime/sys_linux_amd64.s:615
I concluded that something is blocking M2 from starting a M before stopm
is called in M1. To understand what's happening with the lock before entering this state, I added traces around all lock(&sched.lock)
and unlock(&sched.lock)
.. Here is the trace when checkdead panics (not using the exact commit here but using the original throw in checkdead to get a coredump):
eventtrace: t=45456676411335, g=0 m=3, findRunnable acquiring sched lock (2)
eventtrace: t=45456676412160, g=0 m=3, findRunnable acquired sched lock (2)
eventtrace: t=45456676412966, g=0 m=3, findRunnable releasing sched lock (2)
eventtrace: t=45456676413678, g=0 m=3, findRunnable released sched lock (2)
eventtrace: t=45456676431199, g=0 m=3, findRunnable acquiring sched lock (3)
eventtrace: t=45456676431945, g=0 m=3, findRunnable acquired sched lock (3)
eventtrace: t=45456676433067, g=0 m=3, findRunnable releasing sched lock (6)
eventtrace: t=45456676433798, g=0 m=3, findRunnable released sched lock (6)
eventtrace: t=45456676664081, g=0 m=3, findRunnable acquiring sched lock (4)
eventtrace: t=45456676665287, g=0 m=3, findRunnable acquired sched lock (4)
eventtrace: t=45456676666597, g=0 m=3, findRunnable releasing sched lock (7)
eventtrace: t=45456676667355, g=0 m=3, findRunnable released sched lock (7)
eventtrace: t=45456676697198, g=0 m=3, findRunnable acquiring sched lock (3)
eventtrace: t=45456676698169, g=0 m=3, findRunnable acquired sched lock (3)
eventtrace: t=45456676699049, g=0 m=3, findRunnable releasing sched lock (6)
eventtrace: t=45456676699769, g=0 m=3, findRunnable released sched lock (6)
eventtrace: t=45456690354570, g=0 m=3, findRunnable acquiring sched lock (4)
eventtrace: t=45456690355906, g=0 m=3, findRunnable acquired sched lock (4)
eventtrace: t=45456690357369, g=0 m=3, findRunnable releasing sched lock (7)
eventtrace: t=45456690358155, g=0 m=3, findRunnable released sched lock (7)
eventtrace: t=45456690401389, g=0 m=3, findRunnable acquiring sched lock (3)
eventtrace: t=45456690402194, g=0 m=3, findRunnable acquired sched lock (3)
eventtrace: t=45456690403173, g=0 m=3, findRunnable releasing sched lock (6)
eventtrace: t=45456690403902, g=0 m=3, findRunnable released sched lock (6)
eventtrace: t=45456690411975, g=0 m=1, sysmon acquiring sched lock (2)
eventtrace: t=45456690412994, g=0 m=1, sysmon acquired sched lock (2)
eventtrace: t=45456690414262, g=0 m=1, sysmon releasing sched lock (2)
eventtrace: t=45456690414987, g=0 m=1, sysmon released sched lock (2)
eventtrace: t=45456714142246, g=0 m=3, findRunnable acquiring sched lock (4)
eventtrace: t=45456714144224, g=0 m=3, findRunnable acquired sched lock (4)
eventtrace: t=45456714146004, g=0 m=3, findRunnable releasing sched lock (7)
eventtrace: t=45456714146984, g=0 m=3, findRunnable released sched lock (7)
eventtrace: t=45456714152494, g=0 m=3, entersyscall_sysmon acquiring sched lock
eventtrace: t=45456714153484, g=0 m=3, entersyscall_sysmon acquired sched lock
eventtrace: t=45456714157218, g=0 m=3, entersyscall_sysmon releasing sched lock
eventtrace: t=45456714158047, g=0 m=3, entersyscall_sysmon released sched lock
eventtrace: t=45456714211610, g=0 m=3, findRunnable acquiring sched lock (3)
eventtrace: t=45456714212617, g=0 m=3, findRunnable acquired sched lock (3)
eventtrace: t=45456714213942, g=0 m=3, findRunnable releasing sched lock (6)
eventtrace: t=45456714214770, g=0 m=3, findRunnable released sched lock (6)
eventtrace: t=45456714222397, g=0 m=1, sysmon acquiring sched lock (3)
eventtrace: t=45456714223447, g=0 m=1, sysmon acquired sched lock (3)
eventtrace: t=45456714224331, g=0 m=1, sysmon releasing sched lock (3)
eventtrace: t=45456714225123, g=0 m=1, sysmon released sched lock (3)
eventtrace: t=45456714231759, g=0 m=1, injectglist acquiring sched lock (2)
eventtrace: t=45456714232704, g=0 m=1, injectglist acquired sched lock (2)
eventtrace: t=45456714233694, g=0 m=1, injectglist releasing sched lock (3)
eventtrace: t=45456714234466, g=0 m=1, injectglist released sched lock (3)
eventtrace: t=45456714235606, g=0 m=1, injectglist acquiring sched lock (1)
eventtrace: t=45456714236386, g=0 m=1, injectglist acquired sched lock (1)
eventtrace: t=45456714237981, g=0 m=1, injectglist releasing sched lock (2)
eventtrace: t=45456714238734, g=0 m=1, injectglist released sched lock (2)
eventtrace: t=45456722489114, g=0 m=3, findRunnable acquiring sched lock (4)
eventtrace: t=45456722490381, g=0 m=3, findRunnable acquired sched lock (4)
eventtrace: t=45456722491302, g=0 m=3, findRunnable releasing sched lock (7)
eventtrace: t=45456722492095, g=0 m=3, findRunnable released sched lock (7)
eventtrace: t=45456722493297, g=0 m=3, injectglist acquiring sched lock (2)
eventtrace: t=45456722494024, g=0 m=3, injectglist acquired sched lock (2)
eventtrace: t=45456722494834, g=0 m=3, injectglist releasing sched lock (3)
eventtrace: t=45456722495587, g=0 m=3, injectglist released sched lock (3)
eventtrace: t=45456722496541, g=0 m=3, injectglist acquiring sched lock (1)
eventtrace: t=45456722497259, g=0 m=3, injectglist acquired sched lock (1)
eventtrace: t=45456722498017, g=0 m=3, injectglist releasing sched lock (1)
eventtrace: t=45456722498746, g=0 m=3, injectglist released sched lock (1)
eventtrace: t=45456722499544, g=0 m=3, findRunnable: no runnable G
eventtrace: t=45456722500515, g=0 m=3, stopm acquiring sched lock
eventtrace: t=45456722501299, g=0 m=3, stopm acquired sched lock
Here, m=1 is our M2 (scavengerState waking up), and m=3 is our M1 (checkdead panic).
It is a little clearer what's happening here: M2 is stealing P (remember, monothreaded instance) from M1, then yielding somehow, making M1 push runnable goroutines from netpoll
to the global run queue, and proceed to panic in checkdead
as no M is available, M2 being stuck trying to acquire sched.lock
.
It looks like M2 is preempted before being able to reacquire the scheduler lock in startm
. But looking in injectglist
, it seems like preemption is already disabled before calling startm
, as intended:
startIdle := func(n int) {
for i := 0; i < n; i++ {
mp := acquirem() // See comment in startm.
pushEventTrace("injectglist acquiring sched lock (1)")
lock(&sched.lock)
pushEventTrace("injectglist acquired sched lock (1)")
pp, _ := pidlegetSpinning(0) // stealing the P from M1 here
if pp == nil {
pushEventTrace("injectglist releasing sched lock (1)")
unlock(&sched.lock)
pushEventTrace("injectglist released sched lock (1)")
releasem(mp)
break
}
pushEventTrace("injectglist releasing sched lock (2)")
unlock(&sched.lock)
pushEventTrace("injectglist released sched lock (2)")
startm(pp, false) // preempted?? mp.locks=5 at this stage
releasem(mp)
}
}
What could be happening here?
I am trying to solve this by making sure injectglist
does not release the lock before calling startm
. Waiting for results...
Thanks for continuing to dig. I'll look more closely at your logs in a bit.
I will note that checkdead()
is called with sched.lock
held, so it is possible that threads stuck on sched.lock
are a bit of a red herring that are just there because checkdead
is holding the lock. OTOH, I could believe there is a real bug here.
Nice event trace log you added, that looks nice! For future reference, the runtime has similar functionality built in in runtime/debuglog.go
(build with -tags debuglog
to enable). Use looks like:
d := dlog()
if d != nil {
d.s("foo")
d.i(42)
d.end()
}
The buffer will be dumped on throw
.
I am relatively new to the runtime package and would have appreciated discovering debuglog earlier (although I did learn about go's write barriers in the process 🤦).
I agree that the state of M2 in the stack trace does not accurately represent the crash. I think it might not even be entering startm
, although this detail is far from crucial.
We may have identified the issue. I previously mentioned a previous fix regarding the conflicting relationship between startm
and checkdead
. The solution to our issue can be found within the startm
code:
nmp := mget()
if nmp == nil {
// No M is available, we must drop sched.lock and call newm.
// However, we already own a P to assign to the M.
//
// Once sched.lock is released, another G (e.g., in a syscall),
// could find no idle P while checkdead finds a runnable G but
// no running M's because this new M hasn't started yet, thus
// throwing in an apparent deadlock.
//
// Avoid this situation by pre-allocating the ID for the new M,
// thus marking it as 'running' before we drop sched.lock. This
// new M will eventually run the scheduler to execute any
// queued G's.
id := mReserveID()
unlock(&sched.lock)
var fn func()
if spinning {
// The caller incremented nmspinning, so set m.spinning in the new M.
fn = mspinning
}
newm(fn, pp, id)
// Ownership transfer of pp committed by start in newm.
// Preemption is now safe.
releasem(mp)
return
}
The crux of the issue is that if the lock is not acquired at the beginning of startm
because another G steals it, the same problem can occur when checkdead finds no available M's.
I propose a fix that includes introducing a new lockheld
parameter to startm
. This parameter governs all lock
and unlock
calls within startm
. Additionally, a slight modification to the "newm
case" is made to reacquire the lock, ensuring the caller's expected state is maintained.
// in our case, the caller (injectglist) acquires the sched lock and calls startm with lockheld=true
func startm(pp *p, spinning, lockheld bool) {
mp := acquirem()
if !lockheld {
lock(&sched.lock)
}
[...]
nmp := mget()
if nmp == nil {
id := mReserveID()
unlock(&sched.lock) // <- unlock no matter what
var fn func()
if spinning {
// The caller incremented nmspinning, so set m.spinning in the new M.
fn = mspinning
}
newm(fn, pp, id)
if lockheld {
lock(&sched.lock) // <- reacquire lock if necessary
}
releasem(mp)
return
}
All existing references to startm
should pass false
as lockheld
, except for one specific case: the startIdle
function variable in injectglist
would call startm
without releasing the lock:
startIdle := func(n int) {
for i := 0; i < n; i++ {
mp := acquirem() // See comment in startm.
lock(&sched.lock)
pp, _ := pidlegetSpinning(0)
if pp == nil {
unlock(&sched.lock)
releasem(mp)
break
}
startm(pp, false, true)
unlock(&sched.lock)
releasem(mp)
}
}
I am still validating this fix.
Great find! I think you are right, this sounds like an issue. Would you like to send a CL?
Sure! I am sending a CL
Regarding my previous attempt with GODEBUG=dontfreezetheworld=1
, I revisited the coredump, and it appears that dontfreezetheworld
is not the (direct) culprit. Instead, a deadlock in this specific case prevented the program from crashing and generating the coredump.
With schedtrace
or scheddebug
, an unusual behavior emerges. startpanic_m
calls schedtrace(true)
before checking for debug.dontfreezetheworld
. However, schedtrace(true)
tries to acquire sched.lock
, which is already held in the checkdead
context. I'm uncertain about what's happening, but this might be worth investigating further.
Here's the stacktrace after around 40 minutes of deadlock in this specific instance:
0 0x00000000004751c3 in runtime.futex
at /Users/luciencoffe/sdk/gotip/src/runtime/sys_linux_amd64.s:558
1 0x00000000004390b0 in runtime.futexsleep
at /Users/luciencoffe/sdk/gotip/src/runtime/os_linux.go:69
2 0x000000000040bb45 in runtime.lock2
at /Users/luciencoffe/sdk/gotip/src/runtime/lock_futex.go:107
3 0x000000000040c373 in runtime.lockWithRank
at /Users/luciencoffe/sdk/gotip/src/runtime/lockrank_off.go:22
4 0x000000000040baa5 in runtime.lock
at /Users/luciencoffe/sdk/gotip/src/runtime/lock_futex.go:48
5 0x000000000044a6e8 in runtime.schedtrace
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:5645
6 0x000000000043d9f0 in runtime.startpanic_m
at /Users/luciencoffe/sdk/gotip/src/runtime/panic.go:1243
7 0x000000000043d6ee in runtime.fatalthrow.func1
at /Users/luciencoffe/sdk/gotip/src/runtime/panic.go:1150
8 0x000000000043d685 in runtime.fatalthrow
at /Users/luciencoffe/sdk/gotip/src/runtime/panic.go:1149
9 0x000000000043d37c in runtime.throw
at /Users/luciencoffe/sdk/gotip/src/runtime/panic.go:1076
10 0x0000000000449ee5 in runtime.checkdead.func1
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:5294
11 0x0000000000440811 in runtime.forEachG
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:607
12 0x0000000000449bd7 in runtime.checkdead
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:5281
13 0x000000000044b169 in runtime.mput
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:5774
14 0x000000000044391c in runtime.stopm
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:2410
15 0x0000000000444430 in runtime.findRunnable
15 0x0000000000444430 in runtime.findRunnable
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:3085
16 0x0000000000445afd in runtime.schedule
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:3438
17 0x0000000000446019 in runtime.park_m
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:3589
18 0x00000000004714ee in runtime.mcall
at /Users/luciencoffe/sdk/gotip/src/runtime/asm_amd64.s:458
Another thread is stuck attempting to acquire the lock, specifically the injectglist/startm
thread:
0 0x00000000004751c3 in runtime.futex
at /Users/luciencoffe/sdk/gotip/src/runtime/sys_linux_amd64.s:558
1 0x00000000004390b0 in runtime.futexsleep
at /Users/luciencoffe/sdk/gotip/src/runtime/os_linux.go:69
2 0x000000000040bb45 in runtime.lock2
at /Users/luciencoffe/sdk/gotip/src/runtime/lock_futex.go:107
3 0x000000000040c373 in runtime.lockWithRank
at /Users/luciencoffe/sdk/gotip/src/runtime/lockrank_off.go:22
4 0x000000000040baa5 in runtime.lock
at /Users/luciencoffe/sdk/gotip/src/runtime/lock_futex.go:48
5 0x0000000000443a16 in runtime.startm
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:2452
6 0x000000000046aed8 in runtime.injectglist.func1
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:3373
7 0x000000000044593a in runtime.injectglist
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:3383
8 0x0000000000425f3d in runtime.(*scavengerState).wake
at /Users/luciencoffe/sdk/gotip/src/runtime/mgcscavenge.go:427
9 0x000000000044a247 in runtime.sysmon
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:5476
10 0x0000000000442473 in runtime.mstart1
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:1507
11 0x00000000004423c5 in runtime.mstart0
at /Users/luciencoffe/sdk/gotip/src/runtime/proc.go:1464
12 0x0000000000471465 in runtime.mstart
at /Users/luciencoffe/sdk/gotip/src/runtime/asm_amd64.s:395
13 0x000000000047526f in runtime.clone
at /Users/luciencoffe/sdk/gotip/src/runtime/sys_linux_amd64.s:618
Change https://go.dev/cl/487316 mentions this issue: runtime: resolve checkdead panic by refining
startmlock handling in caller context
Regarding schedtrace, we have this problem in a few other places that throw can be called in nearly any context with arbitrary locks held.
The throws in checkdead should unlock sched.lock before throwing. The final one does already (https://cs.opensource.google/go/go/+/master:src/runtime/proc.go;l=5258;bpv=1;bpt=1?), but not the others.
submitted https://go.dev/cl/487316 for #59758
Is it possible to get this bugfix backported to 1.20? It is affecting users of Prometheus monitoring as we default GOMAXPROCS=1 for the node_exporter to reduce problems with data races in the Linux kernel.
@gopherbot Please open a backport issue to 1.20.
Quoting @SuperQ :
Is it possible to get this bugfix backported to 1.20? It is affecting users of Prometheus monitoring as we default GOMAXPROCS=1 for the node_exporter to reduce problems with data races in the Linux kernel.
Backport issue(s) opened: #60760 (for 1.20).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.
@gopherbot Please open a backport to 1.19. This issue is not new in 1.20, and potentially affects users with GOMAXPROCS=1.
Change https://go.dev/cl/504395 mentions this issue: [release-branch.go1.20] runtime: resolve checkdead panic by refining
startmlock handling in caller context
Change https://go.dev/cl/504396 mentions this issue: [release-branch.go1.19] runtime: resolve checkdead panic by refining
startmlock handling in caller context
What version of Go are you using (
go version
)?1.20.3
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?This happens on m3.medium AWS instances. I could not reproduce it anywhere else so far. This is a monothreaded amd64 instance
What did you do?
What did you expect to see?
No crash
What did you see instead?
A crash with "runtime: checkdead: found g XX in status 1". This crash happens on average after a few hours, apparently correlated with the network activity - the more network activity (ie, the more goroutines waiting on network), the more often it will happen.
Additional details
We've encountered a race condition in the Go runtime that leads to a panic in the
checkdead()
function when running on a monothreaded instance (GOMAXPROCS=1). The issue seems related to a specific scenario in thefindRunnable()
function where a goroutine is readied between thefindRunnable()
check andstopm()
call on line 3007 acquiring the scheduler lock, causing an inconsistent scheduler state.Recap:
findRunnable()
cannot find any goroutines to return, reaching the end of the function and callingstopm()
.stopm()
acquires the lock on the scheduler and callsmput()
.mput()
adds the M to the idle pool and callscheckdead()
.checkdead()
finds a runnable G and panics.The race condition appears to be related to the "delicate dance" code in
findRunnable()
, which tries to handle a race between the network poller readying a goroutine and the M being stopped. However, this race condition can still occur in rare cases, leading to a panic incheckdead()
.To investigate this issue further, we tried to create a minimal reproducer, but we could not trigger it consistently due to the race condition's rarity. Nonetheless, runtime stack traces from the
checkdead()
panics in the real-world scenario clearly show that it originates from the specificstopm()
call infindRunnable()
, indicating a strong correlation between the race condition and the panic.A temporary fix patch has been applied to the
checkdead()
function to detect and handle the inconsistent state caused by the race condition without panicking immediately. Instead, it collects information and prints debug logs about the state of goroutines and the runtime stack. It only panics when the inconsistent state occurs too many times in a row or for an extended period of time. All occurrences so far were already resolved (ie an M was active) on the next call to checkdead().This is ofc not a viable solution as it does not fix the root cause of the issue and may introduce other problems.