proposal: runtime: garbage collect goroutines blocked forever

faiface commented 7 years ago

Hi everyone!

As of today, Go's runtime does not garbage collect blocked goroutines. The most cited reason being, that goroutines blocked forever are usually a bug and that collecting them would hide this bug. I would like to show a few examples, where garbage collected goroutines would really be appreciated and lead to a much safer and less buggy code.

What does it mean?

package main

import "fmt"

func numbers() <-chan int {
    ch := make(chan int)
    go func() {
        for i := 0; ; i++ {
            ch <- i
        }
    }()
    return ch
}

func main() {
    for i := 0; i < 1000; i++ {
        for num := range numbers() {
            if num >= 1000 {
                break
            }
            fmt.Println(num)
        }
    }
}

This code has a memory leak in current Go. The function numbers returns a channel that generates an infinite sequence of natural numbers. We get this sequence 1000 times in the main function, print the first 1000 numbers of each sequence and quit. The numbers function spawns a goroutine that feeds the channel with numbers. However, once we print the first 1000 numbers produced by the goroutine, the goroutine stays in the memory, blocked forever. If the first for-loop iterated forever, we would run out of memory very quickly.

How to collect goroutines?

I'd suggest the following algorithm:

All non-blocked goroutines are marked active.
All channels not reachable by active goroutines are marked inactive.
All gouroutines blocked on an inactive channel are marked dead and garbage collected.

Edit: Based on the discussion, I add one detail to the implementation. A goroutine marked dead would be collected in a way that's identical to calling runtime.Goexit inside that goroutine (https://golang.org/pkg/runtime/#Goexit).

Edit 2: Based on the further discussion, runtime.Goexit behavior is debatable and maybe not right.

What are the benefits?

Once such goroutine garbage collection is enabled, we can solve large variety of new problems.

Infinite generator functions. Generator is a function that returns a channel that sends a stream of values. Heavily used in functional languages, they are currently hardly possible in Go (we'd have to send them a stopping signal).
Finite generator functions. We can do them in Go, however, we have to drain the channel if we want to avoid a memory leak.
Manager goroutines. A nice way to construct a concurrent-safe object in Go is to, instead of guarding it by a mutex, create a manager goroutine, that ranges over a channel that sends commands to this object. Manager goroutine then executes this commands as they come in. This is hard to do in Go, because if the object goes out of scope, manager goroutine stays in the memory forever.

All of the described problems are solvable. Manual memory management is also solvable. And this is just like that. Inconvenient, error-prone and preventing us from doing some advanced stuff, that would be really useful.

I'd like to point out, that the whole talk "Advanced Go Concurrency Patterns" would be pointless, if Go collected blocked goroutines.

What do you guys think?

zigo101 commented 7 years ago

@Merovius

"they might not run, even if an object is unreachable"

Here, it is some different. If the blocked goroutine doesn't exit, the finalizers will absolutely not run. But if it exits, the finalizers may run. It is never expected if it really runs.

zigo101 commented 7 years ago

Maybe the rare case is not worth keeping compatibility for it.

I have another simple example. Should the two new created gorotuines be collected or not?

package main

func f(c1, c2 chan int) {
    <-c1
    c2 <- 1
}

func main() {
    c1, c2 := make(chan int), make(chan int)
    go f(c1, c2)
    go f(c2, c1)

    // ...
}

zigo101 commented 7 years ago

Looks the answer is yes.

Then how about this one? The two new created gorotuines will not be collected just for there is a variable holds the channels? or at least not collected unitil the g has been called?

package main

import "runtime"

func f(c chan int) {
    <- c
}

func g(c []chan int) {
    c[0] <- 1
}

func main() {
    c1, c2 := make(chan int), make(chan int)
    go f(c1)
    go f(c2)

    // ...

    cs := []chan int{make(chan int), c1, c2}
    go g(cs)

    // ...

    // ...
    runtime.KeepAlive(&cs)
}

RalphCorderoy commented 7 years ago

I see many CLs on the stdlib to reduce the amount of garbage generated, continual effort on the GC to improve its performance and characteristics, and work on the compiler to improve its escape analysis. To collect blocked-forever goroutines would be encouraging more garbage to be created as the balance of when to use a goroutine shifts. Does it give any benefit that can't be obtained with a little bit of extra coding, e.g. select on a closing channel, or Context?

nullchinchilla commented 7 years ago

In situations where you are programming in an actor-style with large amounts of communicating goroutines, each acting as an object, then using closing channels or Context becomes very cumbersome, similar to manually managing memory. Sometimes it is nice to be able to program in a style centered entirely around goroutines communicating with channels.

I do agree that in most existing Go code there is no need, but I think that's precisely because it is difficult to write in such a style currently, so it is avoided in favor of things like explicit mutexes or abusing Context in cases where "request scope" makes no sense. Go is supposed to be about programming in a CSP style, not in a C-like "call free/unlock on everything" style (which admittedly Go makes a bit easier due to defer), and collecting garbage goroutines will make it a lot easier.

The improvement to the GC, IMO, instead should be a reason to worry less about garbage in Go. Back when Go had a fully stop-the-world GC taking pains to avoid garbage in your code made sense, but now it doesn't. I've already started to remove all of my usages of tricks like object pooling (which really often circumvents memory safety) unless profiling shows significant performance decrease. Go has a very state-of-the-art low-latency GC, and programming idioms should reflect it.

rsc commented 7 years ago

One common question among new Go programmers is why there isn't a way to force kill a goroutine, like Unix kill -9, but for a single goroutine. The answer is that it would be too hard to program in that world, in fear that your goroutine might be killed at any moment. Each goroutine shares state with other goroutines.

What happens to the locks the goroutine holds? The invariants they protect may have been temporarily violated (that's what locks are for) and not yet reestablished. Releasing the lock will break the invariants permanently; not releasing the lock will deadlock the program.

What happens to wait groups waiting for that goroutine? If there's a deferred wg.Done, should it run? If the waitgroup just wants to know the goroutine is no longer exiting, maybe that's OK. But if the waitgroup has a semantic meaning like "all the work I kicked off is done", then it's probably not OK to report that the goroutine is done when in fact it's not really done, just killed.

What happens to the other goroutines that goroutine is expected to communicate with? Maybe the goroutine was about to send a result on a channel. The killer can possibly send the result on the killed goroutine's behalf, but how can the killer know whether the goroutine completed its own send or not before being killed?

For all these reasons and more, Go requires that if you want a goroutine to stop executing, you communicate that fact to the goroutine and let it stop gracefully.

Note that even in Unix, where processes don't share memory, you still end up with problems like in a pipeline p1 | p2 when p2 is killed and p1 keeps trying to write to it. At least in that case the write system call has a way to report errors (in contrast to operations like channel sends or mutex acquisitions), but all the complexity around SIGPIPE exists because too many programs still didn't handle errors correctly in that limited context.

The proposal in this issue amounts to "have the GC kill permanently blocked goroutines". But that raises all the same questions, and it's a mistake for all the same reasons.

More fundamentally, the GC's job is to provide the illusion of infinite memory by reclaiming and reusing memory in ways that do not break that illusion, so that the program behaves exactly as if it had infinite memory and never reused a byte. For all the reasons above, killing a goroutine would very likely break that illusion. If defers are run, or locks are released, now the program behaves differently. If the stack is reclaimed and that happens to cause finalizers to run, now the program behaves differently.

Perhaps worst of all, collecting blocked goroutines would mean that when your whole program deadlocks, there are no goroutines left! So instead of a very helpful snapshot of how all the goroutines got stuck, you get a print reporting a deadlock and no additional information. Deadlocks are today the best possible concurrency bug: when they happen, the program stops and sits there waiting for you to inspect it. Contrast that with race conditions, where the eventual crash happens maybe billions of instructions later and you have to find some way to reconstruct what might have gone wrong. If the GC discards information about deadlocked goroutines, even for partial deadlocks, this fantastic property of deadlocks - that they are easy to debug because everything is sitting right there waiting for you - goes out the window.

Debuggability is the same reason we don't do tail call optimization: when something goes wrong we want to have the whole stack that identifies how we got to the code in question. The useful information discarded by tail call optimization is nothing compared to the useful information discarded by GC reclaiming blocked goroutines.

On top of all these problems, it's actually very difficult in most cases to reliably identify goroutines that are blocked forever. So this optimization would very likely not fire often, making it a rare event. The last thing you want is for this super-subtle behavior that can make your program behave in mysterious ways only happen rarely.

I just don't see collecting permanently blocked goroutines happening in any form. There is a vanishingly small intersection between the set of situations where you even identify the opportunity reliably and the set of situations where discarding the goroutines is safe and doesn't harm debugging.

The GC could address both of these problems - correctness and debuggability - by reclaiming goroutines but being careful to keep around any state required so that it looks like they're still there: don't run defers, record all stack pointers to keep those objects and any finalizers reachable (now or in the future) from those objects live, record a text stack trace to show in the eventual program crash, and so on. But this is really just compression, not collection, since some information must still be preserved. Effort spent compressing leaked memory is probably wasted: better to make it easier for programmers to find and fix leaks instead.

bradfitz commented 7 years ago

@rsc,

Effort spent compressing leaked memory is probably wasted: better to make it easier for programmers to find and fix leaks instead

Thoughts on my comment to Keith above? https://github.com/golang/go/issues/19702#issuecomment-289178306

rsc commented 7 years ago

@bradfitz, I don't think this is worthwhile. The detection rate is so low that it won't be worth the complexity. Leaked goroutines don't even matter until there are a lot of them; people who care can watch runtime.NumGoroutines() against a limit they choose, and that will be much better at detecting. Or just take a look at /debug/pprof once in a while.

nullchinchilla commented 7 years ago

@rsc I don't think this should be about "killing" goroutines. Such a change will never change the observable behavior of any program except for memory usage. It is not about killing "infinitely blocked" goroutines, or deadlock detection, but rather collecting "unreachable" goroutines. A goroutine that's infinitely blocked but has pointer on its stack to a waitgroup that somebody else is waiting on is obviously not unreachable.

Nothing will change from the perspective of programmers using existing Go idioms, except perhaps debugging, which can easily be solved by adding a flag to GODEBUG's GC optiosn that prints something when goroutines are collected. The only things goroutine collecting would add are new idioms / design patterns. And some other languages do indeed collect threads in exactly this unobservable way to enable more design freedom.

Of course, it's fine if the larger Go community feels that idioms such as using background goroutines to synchronize an object don't belong in Go, but my point was that this proposal isn't about magically fixing memory leaks in existing programs, but rather to enable programs that haven't been written yet to be written.

rsc commented 7 years ago

@bunsim I understand your motivation was for new patterns. But I don't believe you can split the hairs well enough here to collect only the goroutines that aren't stuck due to bugs. And you shouldn't have to set a GODEBUG flag (that breaks the idioms you want to enable!) after the fact to get useful information about your deadlocked program.

faiface commented 7 years ago

@rsc When you or anybody else have dealt with goroutines stuck forever, how often was it a bug that wouldn't be solved by garbage collecting that goroutine? If it's often the case, could you give any example?

nullchinchilla commented 7 years ago

The GODEBUG flag wouldn't break the idioms, it would just print "# of goroutines collected this cycle" as it already does for bytes etc. And you already can totally disable the GC using debugging options anyway.

Nobody is suggesting some weird heuristic to collect only goroutines that aren't stuck due to bugs. It's perfectly okay to collect ones that are stuck due to bugs, it only loses us some debugging info we can recover by simply disabling the GC. And a large amount of existing "buggy code" will simply be the correct way of doing things in the future.

The whole argument sounds suspiciously like "we shouldn't use a GC because it hides bugs from Valgrind".

ianlancetaylor commented 7 years ago

I want to stress @rsc 's point that if this is to change the way that people write Go code, then it is essential that it be clear when goroutines will be collected, and that goroutines will be reliably collected in that state. That seems to me to be difficult. It's a lot harder to tell when a goroutine is blocked than it is to tell when memory is not referenced. The canonical case here is a goroutine blocked reading from or writing to a channel, but the goroutine itself is holding a reference to the channel; for example, it may be an argument to the function that blocked, and therefore be on the goroutine's stack. We need to reliably detect not that the channel is unreferenced, but that the only references to the channel are from memory that is only visible to the blocked goroutine. That seems hard.

nullchinchilla commented 7 years ago

@ianlancetaylor That doesn't seem especially harder than GC in general, though? It seems about as difficult as implementing a weak-map (delete from the map if the only references are from memory rooted at the map), though admittedly it isn't something Go has.

ianlancetaylor commented 7 years ago

Weak pointers are easier: you have a pointer type that the GC explicitly ignores (except that it gets updated when an object gets moved, for systems with a moving GC, and it gets cleared when an object is freed.)

What we need for this is something different: given a goroutine G1 blocked on a channel, we need to run a GC from all roots except G1, and see whether we marked the channel. That is straightforward but too inefficient to actually use. I don't know an efficient way to implement this. Perhaps there is one.

nullchinchilla commented 7 years ago

@ianlancetaylor What if we run the GC only from the roots of running/runnable goroutines, and then collect all goroutines blocked on channels we didn't mark? (Of course, with some caveats with timers, etc) This should collect everything in one go, though it's might be hard to integrate it into Go's concurrent GC.

rsc commented 7 years ago

There are many times when a goroutine will be blocked on a channel in a data structure that is accessible to other goroutines if they follow the right chain of pointers in memory, but none of them will. That case and many other related ones fundamentally fool any GC-based algorithm, so that in many cases goroutines will not be collected, and the conditions will be very unclear, depending potentially on optimizations in the compiler that move data between stack and heap, and maybe other optimizations as well. The result will be that it's fairly unpredictable when a blocked goroutine would be collected vs not.

As Ian said, "it is essential that it be clear when goroutines will be collected, and that goroutines will be reliably collected in that state."

nullchinchilla commented 7 years ago

@rsc There are many times when a data structure is inside a datastructure that's accessible from a goroutine if it follows the right chain of pointers in memory, but it never will. So the conditions at which memory will be collected is not clear and GC is "fooled", so garbage collection is fundamentally undecidable. I'm sure you could even construct a scenario where freeing an object at the "right" moment means solving the halting problem. Better use free()!

I've programmed extensively in Racket, a language where unreachable blocked threads are collected, and it is definitely not "fairly unpredictable" whether a blocked thread will be collected. There are several common design patterns that involve blocked threads being collected, and in each case it's very clear why it's being collected, and in other cases you still close them down manually.

In Go's case, we wouldn't be removing any code to shut goroutines down at all in most existing code; we would simply have new idioms in which whether goroutines are collected or not is very obvious. This proposal isn't intended to deprecate goroutine termination devices like tombs, etc, but rather to allow usage of goroutines to implement not exactly thread-like constructs such as coroutines and generators.

nullchinchilla commented 7 years ago

Just as an example, this proposal would allow users not to call Stop() on time.Ticker. Calling stop on objects that don't manage external resources should be the GC's job.

Essentially, this proposal can make "is this object implemented by a background goroutine" a well-encapsulated implementation detail. You might be able to implement time.Ticker without needing Stop() by some magic fiddling with the runtime and scheduler, but whether you do that, or simply use a goroutine that sleeps periodically, shouldn't be exposed in the form of "does it leak memory unless you stop it".

ianlancetaylor commented 7 years ago

The issue of calling Stop on a time.Ticker is actually a different problem. Implementing this suggestion would not solve it. The problem there is that a time.Ticker has an entry in the runtime timer table, and there is nothing that removes that entry even if the time.Ticker is garbage collected.

nullchinchilla commented 7 years ago

@ianlancetaylor Ah, I always thought the reason was that time.Ticker was backed by a sleeping goroutine in a loop. But if this proposal is implemented such an implementation would indeed avoid the need of calling Stop.

golang / go