golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.18k stars 17.57k forks source link

runtime: 'wait: bad address' on FreeBSD/amd64 #6372

Closed wathiede closed 9 years ago

wathiede commented 11 years ago
I run a buildbot for bradfitz's camlistore project on my FreeBSD/amd64 machine (uname:
FreeBSD sagan.sf.xinu.tv 8.3-RELEASE-p9 FreeBSD 8.3-RELEASE-p9 #0: Fri Jul 26 23:07:20
UTC 2013     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64). 

Once in a long while I see something fail to build with 'wait: bad address'  Camlistore
builds against go-release and go-tip for every change submitted to Go or camlistore. 
Today I saw the error with tip  98971b9411b9, but I think I've seen the error with
go-release too.  I'd say it happens once or week or so. 

Doing a google search for 'site:build.golang.org "wait: bad address"' shows
two failed builds on the official Go buildbot with the same error both on FreeBSD/amd64.

I have no idea what is going on here, but I'm filing the issue in the event someone has
an idea how to track this down.
robpike commented 11 years ago

Comment 1:

This error has come up occasionally when doing heavy workloads on FreeBSD. We do not
have a handle on it although some changes involving the madvise system call have reduced
its frequency.
My uninformed opinion is that it is either a kernel bug in FreeBSD or the Go
implementation stumbling over an inconsistency between FreeBSD and the other Unix
implementations.
If you look into the problem, you'll see it's all but inconceivable that this error can
arise. An address becomes invalid in a situation where that truly cannot happen.
We need more reproducible examples or fewer FreeBSDs.

Labels changed: added os-freebsd, priority-someday, removed priority-triage.

Status changed to Accepted.

ianlancetaylor commented 11 years ago

Comment 2:

In fairness, it could happen in principle if there were a GC bug.  The goroutine would
call wait, which would cause a thread to suspend until the wait system call returned. 
The wait system call would be pointing to an integer on the heap.  A GC bug could free
that integer even though there is a pointer to it on the goroutine stack.  It's possible
that everything else on the page would also be freed.  The scavenger could then release
the page back to the OS via madvise.  Then the wait could return, and get precisely that
error.
It doesn't seem very likely but I can't think of anything else other than a kernel bug.
wathiede commented 11 years ago

Comment 3:

Simple, and reproduces fairly quickly:
$ go run wait.go
2013/09/12 18:45:03 Found 8 CPUs, spawning go routines
2013/09/12 18:45:07 5 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:45:12 Found 8 CPUs, spawning go routines
2013/09/12 18:45:20 2 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:45:30 Found 8 CPUs, spawning go routines
2013/09/12 18:45:33 4 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:46:52 Found 8 CPUs, spawning go routines
2013/09/12 18:46:53 4 53 wait: bad address
exit status 1
$ go run wait.go
2013/09/12 18:48:44 Found 8 CPUs, spawning go routines
2013/09/12 18:48:47 7 648 wait: bad address
exit status 1

Attachments:

  1. bug6372.go (534 bytes)
davecheney commented 11 years ago

Comment 4:

Thanks for the repro.

Labels changed: added priority-soon, go1.2, removed priority-someday.

wathiede commented 11 years ago

Comment 5:

To explore Ian's suggestion, running:
$ GOGC=off go run wait.go
2013/09/12 18:58:52 Found 8 CPUs, spawning go routines
Ran for 30 minutes before I got bored and ctrl-c'd it.
davecheney commented 11 years ago

Comment 7:

Slightly simplified example
http://play.golang.org/p/ROB_uGzYxR
# panics within seconds
[dfc@deadwood ~/src]$ GOMAXPROCS=2 go run bug6372.go                                    

2013/09/16 15:39:06 Found 2 CPUs, spawning go routines
2013/09/16 15:39:07 1 1035 wait: bad address
exit status 1
# runs for longer than my attention span would allow.
[dfc@deadwood ~/src]$ GOMAXPROCS=1 go run bug6372.go                                    

2013/09/16 15:39:11 Found 1 CPUs, spawning go routines
^Cexit status 2
Is there a way to increase the number of gc worker threads without increasing the number
of concurrent g's ?
rsc commented 11 years ago

Comment 8:

Thank you for the very simplified case. I have reproduced the problem in C. It is a
FreeBSD kernel bug. I filed http://www.freebsd.org/cgi/query-pr.cgi?pr=182161 and will
work around it in the Go library. The fix is not to use the SYSCALL instruction.

Owner changed to @rsc.

rsc commented 11 years ago

Comment 9:

This issue was closed by revision 555da73c566c156a6982da0e06d49c71f9ea25d.

Status changed to Fixed.

wathiede commented 11 years ago

Comment 10:

In case you hadn't seen, this breaks the build:
http://build.golang.org/log/a09e574dfbb72c98721571ed8e87e634faeb7863