runtime: tight loop hangs process completely after some time

creker commented 8 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)? go version go1.6.2 windows/amd64
What operating system and processor architecture are you using (go env)? Windows 10.0.10586 am64
What did you do? Ran this code

package main

import (
    "log"
    "runtime"
)

func main() {
    runtime.GOMAXPROCS(2)
    ch := make(chan bool)

    go func() {
        for {
            ch <- true
            log.Println("sent")
        }
    }()

    go func() {
        for {
            <-ch
            log.Println("received")
        }
    }()

    for {   
    }
}

What did you expect to see? Process printing "sent" and "received" until terminated
What did you see instead? Process runs and prints as expected for about 2 seconds and then hangs. Nothing is printed after that, process just eats up CPU. No panics or anything.

I put runtime.GOMAXPROCS(2) to make sure that there're multiple threads that goroutines can ran on. Obviously with runtime.GOMAXPROCS(1) process would hang immediately as expected - for loop will not yield execution.

I tried to replace the for loop with this so that main goroutine can yield execution:

go func() {
    for {
    }
}()

select {}

But exactly the same thing happens. Now, if I put time.Sleep(10 * time.Millisecond) or longer after log.Println("sent") then process no longer hangs. I ran it for a minute and it's just kept going. Don't know, maybe it will still hang much later. If I change it to 2 ms then it hangs after 30 seconds. I tried to collect trace data but it looks like it gets corrupted because trace doesn't finish. When I try to view the trace it says "failed to parse trace: no EvFrequency event".

Everything behaves exactly the same on Mac OSX El capitan 10.11.4 (15E65) Go 1.6.2

I read the #10958 but here the weird thing is that it actually runs for awhile completely fine and only after that it hangs.

ianlancetaylor commented 8 years ago

I can not recreate the problem on GNU/Linux (using the select {} version; I don't think the for {} version is interesting for us). I don't see how this could be Windows-specific, but could somebody with a WIndows machine try to recreate the problem on Windows? Thanks.

creker commented 8 years ago

It's not Windows-specific. The same thing happens on OS X.

Just tested both versions on Ubuntu 14.04 LTS 3.13.0-24-generic virtual machine with Go 1.6.2 64-bit. Both versions hang after 20 seconds. Adding time.Sleep(10 * time.Millisecond) gives the same result as on other OSes.

ianlancetaylor commented 8 years ago

I just ran the program using select {} on GNU/Linux for over six minutes without a problem. This was on a native kernel, not a VM, on Ubuntu 14.04.

When the program hangs on GNU/Linux, kill it by typing ^\. That should dump a complete stack backtrace. Attach that here. Thanks.

creker commented 8 years ago

Another interesting find. I was running the program through ssh and it caused program to output more slowly. And process was no longer hanging. Once I ran it in VM terminal itself it did hang. Tried to output to a file instead of the console to remove the bottleneck - hangs within a second. So it looks like execution speed affects this issue.

Source

package main

import (
    "log"
    "runtime"
)

func main() {
    runtime.GOMAXPROCS(2)
    ch := make(chan bool)

    go func() {
        for {
            ch <- true
            log.Println("sent")
        }
    }()

    go func() {
        for {
            <-ch
            log.Println("received")
        }
    }()

    go func() {
        for {
        }
    }()

    select {
    }
}

Linux backtrace

SIGQUIT: quit
PC=0x401310 m=0

goroutine 7 [running]:
main.main.func3()
        /home/uweb/gowork/src/issue/main.go:27 fp=0xc820022fc0 sp=0xc820022fb8
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820022fc8 sp=0xc820022fc0
created by main.main
        /home/uweb/gowork/src/issue/main.go:29 +0x9e

goroutine 1 [select (no cases)]:
main.main()
        /home/uweb/gowork/src/issue/main.go:31 +0xa3

goroutine 5 [running]:
        goroutine running on other thread; stack unavailable
created by main.main
        /home/uweb/gowork/src/issue/main.go:17 +0x64

goroutine 6 [chan receive]:
main.main.func2(0xc8200140c0)
        /home/uweb/gowork/src/issue/main.go:21 +0x42
created by main.main
        /home/uweb/gowork/src/issue/main.go:24 +0x86

rax    0x0
rbx    0x401310
rcx    0xc820022800
rdx    0x52e288
rdi    0x42f690
rsi    0x589b60
rbp    0x0
rsp    0xc820022fb8
r8     0x589ea0
r9     0x0
r10    0x0
r11    0x0
r12    0x2c
r13    0x52d8e4
r14    0x0
r15    0x8
rip    0x401310
rflags 0x206
cs     0x33
fs     0x0
gs     0x0
exit status 2

OS X backtrace

SIGQUIT: quit
PC=0x2350 m=0

goroutine 7 [running]:
main.main.func3()
    /Users/creker/Documents/Projects/go/src/hello/main.go:27 fp=0xc82002afc0 sp=0xc82002afb8
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82002afc8 sp=0xc82002afc0
created by main.main
    /Users/creker/Documents/Projects/go/src/hello/main.go:29 +0x9e

goroutine 1 [select (no cases)]:
main.main()
    /Users/creker/Documents/Projects/go/src/hello/main.go:31 +0xa3

goroutine 5 [chan send]:
main.main.func1(0xc8200140c0)
    /Users/creker/Documents/Projects/go/src/hello/main.go:14 +0x4b
created by main.main
    /Users/creker/Documents/Projects/go/src/hello/main.go:17 +0x64

goroutine 6 [running]:
    goroutine running on other thread; stack unavailable
created by main.main
    /Users/creker/Documents/Projects/go/src/hello/main.go:24 +0x86

rax    0x0
rbx    0x2350
rcx    0xc82002a800
rdx    0x12c7b0
rdi    0x303f0
rsi    0x1875c0
rbp    0x0
rsp    0xc82002afb8
r8     0x187900
r9     0x0
r10    0x0
r11    0x0
r12    0x2c
r13    0x12be30
r14    0x0
r15    0x8
rip    0x2350
rflags 0x206
cs     0x2b
fs     0x0
gs     0x0
exit status 2

rhedile commented 8 years ago

I can confirm the behaviour on 14.04 on a KVM with 3 VPUs. go is 1.6.0

This is the scheduler as the programm begins to spin.

2016/04/27 05:48:11 received 2016/04/27 05:48:11 sent SCHED 1016ms: gomaxprocs=2 idleprocs=0 threads=5 spinningthreads=0 idlethreads=2 runqueue=0 gcwaiting=1 n midlelocked=0 stopwait=1 sysmonwait=0 P0: status=3 schedtick=25 syscalltick=163151 m=4 runqsize=0 gfreecnt=0 P1: status=1 schedtick=2 syscalltick=0 m=0 runqsize=0 gfreecnt=0 M4: p=0 curg=20 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 helpgc=0 spinning=false blocked=fals e lockedg=-1 M3: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 helpgc=0 spinning=false blocked=false lockedg=-1 M2: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 helpgc=0 spinning=false blocked=false lockedg=-1 M1: p=-1 curg=-1 mallocing=0 throwing=0 preemptoff= locks=1 dying=0 helpgc=0 spinning=false blocked=false lockedg=-1 M0: p=1 curg=21 mallocing=0 throwing=0 preemptoff= locks=0 dying=0 helpgc=0 spinning=false blocked=false lockedg=-1 G1: status=4(select (no cases)) m=-1 lockedm=-1 G2: status=4(force gc (idle)) m=-1 lockedm=-1 G17: status=4(GC sweep wait) m=-1 lockedm=-1 G18: status=4(finalizer wait) m=-1 lockedm=-1 G19: status=4(chan send) m=-1 lockedm=-1 G20: status=2(chan receive) m=4 lockedm=-1 G21: status=2() m=0 lockedm=-1 G3: status=4(GC worker (idle)) m=-1 lockedm=-1 G4: status=4(GC worker (idle)) m=-1 lockedm=-1

On 26 April 2016 at 22:38, Antonenko Artem notifications@github.com wrote:

Another interesting find. I was running the program through ssh and it caused program to output more slowly. And process was no longer hanging. Once I ran it in VM terminal itself it did hang. Tried to output to a file instead of the console to remove the bottleneck - hangs within a second. So it looks like execution speed affects this issue.

Source

package main

import ( "log" "runtime" )

func main() { runtime.GOMAXPROCS(2) ch := make(chan bool)
go func() {
    for {
        ch <- true
        log.Println("sent")
    }
}()

go func() {
    for {
        <-ch
        log.Println("received")
    }
}()

go func() {
    for {
    }
}()

select {
}
}

Linux backtrace

SIGQUIT: quit PC=0x401310 m=0

goroutine 7 [running]: main.main.func3() /home/uweb/gowork/src/issue/main.go:27 fp=0xc820022fc0 sp=0xc820022fb8 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820022fc8 sp=0xc820022fc0 created by main.main /home/uweb/gowork/src/issue/main.go:29 +0x9e

goroutine 1 [select (no cases)]: main.main() /home/uweb/gowork/src/issue/main.go:31 +0xa3

goroutine 5 [running]: goroutine running on other thread; stack unavailable created by main.main /home/uweb/gowork/src/issue/main.go:17 +0x64

goroutine 6 [chan receive]: main.main.func2(0xc8200140c0) /home/uweb/gowork/src/issue/main.go:21 +0x42 created by main.main /home/uweb/gowork/src/issue/main.go:24 +0x86

rax 0x0 rbx 0x401310 rcx 0xc820022800 rdx 0x52e288 rdi 0x42f690 rsi 0x589b60 rbp 0x0 rsp 0xc820022fb8 r8 0x589ea0 r9 0x0 r10 0x0 r11 0x0 r12 0x2c r13 0x52d8e4 r14 0x0 r15 0x8 rip 0x401310 rflags 0x206 cs 0x33 fs 0x0 gs 0x0 exit status 2

OS X backtrace

SIGQUIT: quit PC=0x2350 m=0

goroutine 7 [running]: main.main.func3() /Users/creker/Documents/Projects/go/src/hello/main.go:27 fp=0xc82002afc0 sp=0xc82002afb8 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc82002afc8 sp=0xc82002afc0 created by main.main /Users/creker/Documents/Projects/go/src/hello/main.go:29 +0x9e

goroutine 1 [select (no cases)]: main.main() /Users/creker/Documents/Projects/go/src/hello/main.go:31 +0xa3

goroutine 5 [chan send]: main.main.func1(0xc8200140c0) /Users/creker/Documents/Projects/go/src/hello/main.go:14 +0x4b created by main.main /Users/creker/Documents/Projects/go/src/hello/main.go:17 +0x64

goroutine 6 [running]: goroutine running on other thread; stack unavailable created by main.main /Users/creker/Documents/Projects/go/src/hello/main.go:24 +0x86

rax 0x0 rbx 0x2350 rcx 0xc82002a800 rdx 0x12c7b0 rdi 0x303f0 rsi 0x1875c0 rbp 0x0 rsp 0xc82002afb8 r8 0x187900 r9 0x0 r10 0x0 r11 0x0 r12 0x2c r13 0x12be30 r14 0x0 r15 0x8 rip 0x2350 rflags 0x206 cs 0x2b fs 0x0 gs 0x0 exit status 2

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-214878172

davecheney commented 8 years ago

@creker I'm sorry but we cannot accept a bug with a for {} infinite loop.

The reason this program stalls is the for {} will consume a proc, and this proc will not stop for garbage collection.

I am going to close this issue as I do not believe there is an issue. I recommend if you want to discuss this further please take this to another forum, such as the mailing list.

josharian commented 8 years ago

@davecheney it looks from my skimming of the issue that it also reproduces with select {}.

davecheney commented 8 years ago

@josharian i think there is still a for {} in there,

    go func() {
        for {
        }
    }()

    select {
    }

If this issue can be reproduced without a for {} then I am happy to see this issue reopened and investigated further.

josharian commented 8 years ago

Hmm. The original report doesn't match the later one. Those who can reproduce this: Does it reproduce without any for {} loops?

rhedile commented 8 years ago

Using an empty select has other side effects. Being very old and just a user I am very uncomfortable with the thought that "tick,tock" constructs accepted by the compiler and vet lead to a program that initially works as intended then enters a undefined condition without panic. Naturally this doesn't spin the runtime. package main

import ( // "fmt" )

func main() { ch := make(chan int) exit := make(chan bool)

    go func() {
            for {
                    ch <- 1
                    //                      fmt.Println("sent i is ", i)

            }
    }()

    go func() {
            var i int = 0
            for {
                    i += <-ch
                    //                      fmt.Println("received i

is", i) if i > 1000000 { exit <- true } }

    }()

    <-exit

}

On 27 April 2016 at 06:33, Josh Bleecher Snyder notifications@github.com wrote:

Hmm. The original report doesn't match the later one. Those who can reproduce this: Does it reproduce without any for {} loops?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-214967402

davecheney commented 8 years ago

Using an empty select has other side effects.

What other side effects ?

rhedile commented 8 years ago

On 27 April 2016 at 07:14, Dave Cheney notifications@github.com wrote:

Using an empty select has other side effects.

What other side effects ?

Caveat: knowledge state go 1.6

for same reason Banks still use COBOL heap sorts for some tasks; predictability.

select{}, correctly, reads the channel list on entry. Then, correctly, checks the senders/receivers for the state of its cases. Unfortunate time spent reading the channel list ist undefined. The channel list is protected by mutexes . If the rate of channel creation is proportional to load and the time spent waiting to read exceeds a gc cycle then pseudo random determines the read list complete. Every time select{} is woken it can block for an undefined period to time. We had a similar discussion last year. The consensus was "do not use defaults in select". One real reason was the read message in the select case was being held until the select exited. However, time spent reentering the select in a for{} under load was the main cause of our performance loss.

rgds, Nigel Vickers

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-214971729

davecheney commented 8 years ago

I'm sorry this seems unrelated to the original issue. The reason for using select {} over for {} is they both block the current goroutine from making any further progress, but the former does it by removing the goroutine from the scheduler (as none if its zero cases are selectable), the latter does so by spinning in a loop which cannot be interrupted.

If you believe there is a bug, can you please produce a runnable sample that does not use a for {} loop, preferably on play.golang.org, that demonstrates the issue.

rhedile commented 8 years ago

I confirm that the behaviour experienced using for{} in main() in the test code was not experienced when replaced by select{} in our environment.

On 27 April 2016 at 08:35, Dave Cheney notifications@github.com wrote:

I'm sorry this seems unrelated to the original issue. The reason for using select {} over for {} is they both block the current goroutine from making any further progress, but the former does it by removing the goroutine from the scheduler (as none if it's zero cases are selectable), the latter does so by spinning in a loop which cannot be stopped.

If you believe there is a bug, can you please produce a runnable sample that does not use a for {} loop, preferably on play.golang.org, that demonstrates the issue.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-214984023

creker commented 8 years ago

The reason this program stalls is the for {} will consume a proc, and this proc will not stop for garbage collection.

Thank you, that does explain why this is happening. If I insert runtime.GC() in one of the goroutines but not the one with the for loop then program hangs upon calling it for the first time.

It still look like a strange behaviour to lock entire process but at least I understand why it's happening. Hope that #10958 would be fixed as it does look like it may affect real production code.

ianlancetaylor commented 8 years ago

@rhedile A literal select {} does not have any channels. It is compiled into a call to the runtime function block. The function does not acquire any mutexes, it simply blocks forever.

dr2chase commented 8 years ago

I'm starting to think that if the compiler sees an (obviously) infinite loop, it could arrange to insert a call to select{}

minux commented 8 years ago

I think inserting a call to runtime.Gosched() in (obvious) infinite loops might be more appropriate (it changes the code semantics less, select {} breaks the code whereas the programmer obviously want to loop)

At least most of similar reports on the issue tracer involves for {}, so if the compiler could insert call to runtime.Gosched() automatically, it should help with those reports.

However, people might intentionally use for {} to keep one goroutine busy, so I'm not sure we need to do something here.

creker commented 8 years ago

Maybe instead compiler should generate an error if he encounters an infinite loop? Now program just locks up without any diagnostic messages and to understand why you need to understand how goroutines are scheduled. And in case of this issue even that didn't help me, I didn't know that GC could also do that.

for {} is not usuable for anything, it's just generates the issue. Even if for {} has a body compiler probably can detect that it will never call the runtime. For example, if every function call (which also doesn't call the runtime) is inlined then scheduler will not be called on function entry. But I suspect it will require much more complex analysis. On the other hand, if loop body has anything useful then it's no longer an issue because it will eventually call the runtime.

cznic commented 8 years ago

Maybe instead compiler should generate an error if he encounters an infinite loop?

Then there would be no way to write a CPU baking program.

On a more serious note, empty for loop is a legal language concept, sending SIGQUIT diagnoses it easily if needed.

creker commented 8 years ago

Well, it didn't help me. SIGQUIT didn't output anything that would tell me that it's GC that locked up the process. The stacktrace doesn't even mention any relevant Go runtime sources so that I could at least start somewhere.

Yes, for {} loop is legal but it leads to program that locks up without telling why. You have to understand Go runtime to know why and not even the basics of it. There're 3 solutions that I can think of right now:

Leave everything as it is but output better diagnostic messages so that the cause of the issue is obvious.
Insert runtime.Gosched() call.
Don't allow infinite loops at all.

cznic commented 8 years ago

If SIGQUIT doesn't show the for {} loop line, it's probably worth filing an issue. Meanwhile, grep 'for {}' to the rescue. Most programs should not have that line, ever.

creker commented 8 years ago

It does show it but it doesn't tell the reason. for {} is not the reason, it's how GC works is what causes the issue. for {} just triggers it. The whole point here is to understand why.

I agree and as I said, for {} is useless in real code. What I forgot to mention is it's not me who found that issue http://stackoverflow.com/questions/36826622/why-is-the-following-code-sample-stuck-after-some-iterations/ I couldn't understand why it behaves like it does, started playing with it and decided to open the issue to help me and everyone else understand what's going on.

It's an edge case when people learning Go. And most of the time they are about goroutines scheduling. For example, you insert for {} and suddenly your goroutines are no longer scheduled because GOMAXPROCS=1 and scheduler is never given a chance to execute any other goroutine. People still have difficulties with that but at least SO has many great answers that cover exactly why it works like that. There're blog posts that cover the scheduler and from that it's obvious why.

But the issue here is not covered anywhere. Which leads to a bigger problem - the lack of good diagnostic messages when process locks up and people don't understand why. Yes, it's useless non-production code but it's very important when you're learning new stuff. You're playing with it, deliberately triggering edge cases to understand the limitations. And it's good when program tells you that you reached the limit. Right now your program just hangs. To understand why you either need to ask another question on SO which will be closed as duplicate or left unanswered or you google anything on Go runtime, read blog posts, Go team mail lists and Google Docs. Here it didn't help me. No one gave an answer to that SO question, accepted answer is wrong. And it's not like there isn't anyone who understands Go well - many answers are from Google employees themselves.

So it would be great to either print somehow a diagnostic message which might be not very easy in these cases. Or insert runtime.Gosched() and solve these issues once and for all. Right now it's like C++ - something is broken but only a few chosen ones understand why. For me, that's not what Go is about.

Sorry for such a long comment.

RLH commented 8 years ago

The GC needs to preempt a goroutine in a timely fashion. Preemption happens at GC safepoints which include function calls as well as various channel and scheduler commands. If the time between these safepoints is large then the GC may not be able to make progress. For loops such as "for {}" that do not contain a safepoint this can hang the system.

There are a couple of ways to avoid this issue, one is to accept the fact that the GC may be delayed until the loop is exited and if need be add a runtime.Gosched call in the loop. Another is to teach the compiler to detect loops that do not contain a GC safepoint and insert a check and a safepoint. This overhead may adversely affect the performance of tight loops that folks care a lot about. At the cost of increasing the size of binaries the compiler could unroll the loop to improve performance. Unfortunately a compiler can't tell how long a loop will run and in fact whether or not it will exit. Any fix will have a downside.

At the end of the day it is a matter of where the community wants to put its resources. Education seems to be the best way forward for now. Write programs that terminate is a good first bit of advice. Another piece is to avoid tight loops that execute for a long time that do not contain function calls, yields, or channel operations.

On Sun, May 1, 2016 at 10:25 AM, Antonenko Artem notifications@github.com wrote:

It does show it but it doesn't tell the reason. for {} is not the reason, it's how GC works is what causes the issue. for {} just triggers it. The whole point here is understand why.

I agree and as I said, for {} is useless in real code. What I forgot to mention is it's not me who found that issue http://stackoverflow.com/questions/36826622/why-is-the-following-code-sample-stuck-after-some-iterations/ I couldn't understand why it behaves like it does, started playing with it and decided to open an issue.

It's an edge case when people learning Go. And most of the time they are about goroutines scheduling. For example, you insert for {} and suddenly your goroutines are no longer scheduled because GOMAXPROCS=1 and scheduler is never given a chance to execute any other goroutine. People still have difficulties with that but at least SO has many great answers that cover exactly why it works like that. There're many blog posts that cover the scheduler and from that it's obvious why.

But the issue here is not covered anywhere. Which leads to a bigger problem - the lack of good diagnostic messages when process locks up and people don't understand why. Yes, it's useless non-production code but it's very important when you're learning new stuff. You're playing with it, deliberately triggering edge cases to understand the limitations. And it's good when program tells you that you reached the limit. Right now your program just hangs. To understand why you either need to ask another question on SO which will be closed as duplicate or left unanswered or you google anything on Go runtime, read blog posts, Go team mail lists and Google Docs. Here it didn't help me. No one gave an answer to that SO question, accepted answer is wrong. And it's not like there isn't anyone who understands Go well - many answers are from Google employees themselves.

So it would be goos to either print diagnostic messages which might not be very easy in these cases. Or insert runtime.Gosched() and solve these issues once and for all.

Sorry for such a long comment.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-216044725

minux commented 8 years ago

I just don't think we need to solve the problem.

Tight loops are created for a reason, and the compiler should respect that.

for {} is troublesome, but most of them are used in toy examples.

davecheney commented 8 years ago

I agree. I don't think this is a problem that needs to be solved in code.

On Wed, May 4, 2016 at 8:14 AM, Minux Ma notifications@github.com wrote:

I just don't think we need to solve the problem.

Tight loops are created for a reason, and the compiler should respect that.

for {} is troublesome, but most of them are used in toy examples.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/15442#issuecomment-216705087

golang / go

runtime: tight loop hangs process completely after some time #15442