golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123k stars 17.54k forks source link

runtime: add test for syscall failing to create new OS thread during syscall.Exec #20822

Open jvshahid opened 7 years ago

jvshahid commented 7 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.8 linux/amd64 (same behavior with 1.8.3)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"          
GOBIN=""                
GOEXE=""                
GOHOSTARCH="amd64"      
GOHOSTOS="linux"        
GOOS="linux"            
GOPATH="/home/jvshahid/codez/gocodez"           
GORACE=""               
GOROOT="/home/jvshahid/.gvm/gos/go1.8"          
GOTOOLDIR="/home/jvshahid/.gvm/gos/go1.8/pkg/tool/linux_amd64"                                  
GCCGO="gccgo"           
CC="gcc"                
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build588313748=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"               
CGO_ENABLED="1"         
PKG_CONFIG="pkg-config" 
CGO_CFLAGS="-g -O2"     
CGO_CPPFLAGS=""         
CGO_CXXFLAGS="-g -O2"   
CGO_FFLAGS="-g -O2"     
CGO_LDFLAGS="-g -O2"    

What did you do?

Run this app in a while loop, e.g. while true; do go run main.go; done

What did you expect to see?

/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd
/path/to/pwd

What did you see instead?

runtime: failed to create new OS thread (have 5 already; errno=11)                               
runtime: may need to increase max user processes (ulimit -u)                                     
fatal error: newosproc                                                                           

Kernel version (uname -a)

Linux amun 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

There are few issues that were opened in the past with the same error message. The most relevant comment i found in all of them is this comment which suggests that this could be a kernel issue and was looking for a way to reproduce the problem. Some interesting notes:

  1. setting GOMAXPROCS to 1 make the problem hard to reproduce (may be event eliminate it)
  2. the go runtime usually gets a chance to run for a while before the process threads are killed. that means that the process will sometime exec successfully and exit 0 and will sometimes exit with non-0 status code after panicing
jvshahid commented 7 years ago

/cc @ianlancetaylor since i referenced his comment

ianlancetaylor commented 7 years ago

I'm not surprised that this fails, and I don't think it's a bug. Running go run main.go means starting the Go tool, which will look at main.go, check that all the imports are up to date, run the compiler, run the linker, and only then run your (simple) program. While it is doing that, your shell loop has plenty of time to loop around and start another instance of go run main.go. The number of go run main.go builds running in parallel will steadily increase, especially as the load on the system increases and each one takes longer and longer to complete. Soon you will hit your process limit (which you can by running ulimit -u) and you will get the error you are reporting.

If you want to show a real problem, run go build main.go and then run ./main in a loop. Then you will be running a very simple program where there is a realistic possibility that the program can complete in the time it takes the shell to loop around. Even then I expect they will tend to stack up, but it should take a lot longer.

jvshahid commented 7 years ago

This while loop is running go run main.go synchronously, i.e. it will wait for it to exit. Simple way to verify that is to replace the echo $PWD with echo before && sleep 10 && echo after.

jvshahid commented 7 years ago

Also worth noting this is reproducible after few runs (10 or 20 runs). It is not consuming all the pids on the system

ianlancetaylor commented 7 years ago

Ah, OK, sorry.

What does ulimit -u print on your system?

Immediately after the loop fails, what does ps print?

jvshahid commented 7 years ago
$ ulimit -u
62821

it is really hard to make the loop fail, but i currently have 377 threads running. I don't imagine this loop to be adding enough processed and/or threads to exceed the limit:

$ ps -elF | wc -l
377
jvshahid commented 7 years ago

Here's the system wide limits:

$ cat /proc/sys/kernel/pid_max
32768
jvshahid@amun [~/codez/gocodez/src/github.com/jvshahid/testexec]
$ cat /proc/sys/kernel/threads-max
125642

I really doubt this has anything to do with limits

bradfitz commented 7 years ago

Any difference with Go 1.9beta2?

ianlancetaylor commented 7 years ago

Ah, you're right. This is #18146 for a program that doesn't use cgo. Sorry for forgetting about that.

jvshahid commented 7 years ago

@bradfitz yes go1.9beta2 fixes the issue. I'm guessing it is 91139b87f776a553524b022753981e7909386777 that fixed it by introducing a lock. I was also curious if you think setting GOMAXPROCS to 1 is a reasonable workaround for the meantime ?

bradfitz commented 7 years ago

Good to hear. So I guess what this bug needs now is a test.

I think we'd prefer you use go1.9beta2 as your fix rather than GOMAXPROCS=1 as a workaround. Go 1.9 has no known bugs compared to Go 1.8.

jvshahid commented 7 years ago

@bradfitz do you think converting the bash loop into a go test will be ok to merge in ? I'm concerned that this might be flaky test. what do you think ?

bradfitz commented 7 years ago

Ideally the test should execute pretty quickly. And flaky tests are no good, but I don't see why this one would be flaky. Rather than expect a failure in, say, 10,000 iterations, just do 10,000 iterations and pass if you don't get a failure. Assuming you used to generally get a failure in 10,000 iterations.

odeke-em commented 6 years ago

Hello @jvshahid, might you be interested or available to submit a CL with the suggested test for Go1.11?