Closed wathiede closed 9 years ago
This error has come up occasionally when doing heavy workloads on FreeBSD. We do not have a handle on it although some changes involving the madvise system call have reduced its frequency. My uninformed opinion is that it is either a kernel bug in FreeBSD or the Go implementation stumbling over an inconsistency between FreeBSD and the other Unix implementations. If you look into the problem, you'll see it's all but inconceivable that this error can arise. An address becomes invalid in a situation where that truly cannot happen. We need more reproducible examples or fewer FreeBSDs.
Labels changed: added os-freebsd, priority-someday, removed priority-triage.
Status changed to Accepted.
In fairness, it could happen in principle if there were a GC bug. The goroutine would call wait, which would cause a thread to suspend until the wait system call returned. The wait system call would be pointing to an integer on the heap. A GC bug could free that integer even though there is a pointer to it on the goroutine stack. It's possible that everything else on the page would also be freed. The scavenger could then release the page back to the OS via madvise. Then the wait could return, and get precisely that error. It doesn't seem very likely but I can't think of anything else other than a kernel bug.
Simple, and reproduces fairly quickly: $ go run wait.go 2013/09/12 18:45:03 Found 8 CPUs, spawning go routines 2013/09/12 18:45:07 5 wait: bad address exit status 1 $ go run wait.go 2013/09/12 18:45:12 Found 8 CPUs, spawning go routines 2013/09/12 18:45:20 2 wait: bad address exit status 1 $ go run wait.go 2013/09/12 18:45:30 Found 8 CPUs, spawning go routines 2013/09/12 18:45:33 4 wait: bad address exit status 1 $ go run wait.go 2013/09/12 18:46:52 Found 8 CPUs, spawning go routines 2013/09/12 18:46:53 4 53 wait: bad address exit status 1 $ go run wait.go 2013/09/12 18:48:44 Found 8 CPUs, spawning go routines 2013/09/12 18:48:47 7 648 wait: bad address exit status 1
Attachments:
Slightly simplified example http://play.golang.org/p/ROB_uGzYxR # panics within seconds [dfc@deadwood ~/src]$ GOMAXPROCS=2 go run bug6372.go 2013/09/16 15:39:06 Found 2 CPUs, spawning go routines 2013/09/16 15:39:07 1 1035 wait: bad address exit status 1 # runs for longer than my attention span would allow. [dfc@deadwood ~/src]$ GOMAXPROCS=1 go run bug6372.go 2013/09/16 15:39:11 Found 1 CPUs, spawning go routines ^Cexit status 2 Is there a way to increase the number of gc worker threads without increasing the number of concurrent g's ?
Thank you for the very simplified case. I have reproduced the problem in C. It is a FreeBSD kernel bug. I filed http://www.freebsd.org/cgi/query-pr.cgi?pr=182161 and will work around it in the Go library. The fix is not to use the SYSCALL instruction.
Owner changed to @rsc.
In case you hadn't seen, this breaks the build: http://build.golang.org/log/a09e574dfbb72c98721571ed8e87e634faeb7863