Open haaawk opened 2 days ago
Related Issues
Related Discussions
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
I send a patch here https://go-review.googlesource.com/c/sync/+/627075
Change https://go.dev/cl/627075 mentions this issue: x/sync/errgroup: ensure all goroutines finish before
Waitreturns
go g.Go(
I don't think this is what you meant to write.
go g.Go(
I don't think this is what you meant to write.
It is exactly what I meant to write. Otherwise the g.Go would block.
Your program contains a data race, because it's not valid to increment a waitgroup with 0 count while waiting on it. Your submitted patch doesn't change this.
Your program contains a data race, because it's not valid to increment a waitgroup with 0 count while waiting on it. Your submitted patch doesn't change this.
The waitgroup has count of 1 when the go g.Go
is called
No. You would need additional synchronization to guarantee this.
package main
import (
"runtime"
"golang.org/x/sync/errgroup"
)
func main() {
runtime.GOMAXPROCS(1)
g := &errgroup.Group{}
g.SetLimit(1)
ch := make(chan struct{})
wait := make(chan struct{}, 2)
insideGo := make(chan struct{})
g.Go(func() error {
<-ch
wait <- struct{}{}
return nil
})
go g.Go(func() error {
close(insideGo)
println("I'm not blocked")
wait <- struct{}{}
return nil
})
println("Ok let's play")
close(ch)
<-insideGo
g.Wait()
println("It's over already?")
<-wait
<-wait
}
@jrick is right. go g.Go(...)
is not what you want. g.Go
is intentionally blocking to limit the number of active goroutines. You've called g.SetLimit(1)
, so that limits it to 1 active goroutine a time. If you don't want that limit, you can remove that line, or set a higher limit.
In your case, there is no synchronization between g.Wait
and the g.Go
call in the new goroutine created by go g.Go(...)
. If is possible that g.Wait
has finished while the new goroutine created by go g.Go(...)
hasn't run. So this works as intended. Your CL doesn't seem to change it either: g.Wait
can still finish even before the new goroutine reaches g.Go
.
In general, if you have questions about how to use the API, please see https://go.dev/wiki/Questions . Thanks.
Perhaps I misunderstand, but I think the Group is working as intended. SetLimit may prevent the Group from accepting new work, and the client must deal with that. It definitely cannot call Go asynchronously as there would be no guarantee that it happened before the later call to Wait.
Perhaps the documentation could be improved, but I don't think there's a bug.
Ah, @cherrymui beat me to it!
This was just my failed attempt to have a simple reproducer to the issue that's still there. In the code below the order is always right and the program still prints the same result. The problem is that g.Go can wake up from sem in a group that was already waited on. Let me thing of less artificial example and come back. It will need to be more complex unfortunately.
package main
import (
"runtime"
"time"
"golang.org/x/sync/errgroup"
)
func main() {
runtime.GOMAXPROCS(1)
g := &errgroup.Group{}
g.SetLimit(1)
ch := make(chan struct{})
wait := make(chan struct{}, 2)
g.Go(func() error {
<-ch
wait <- struct{}{}
return nil
})
go g.Go(func() error {
println("I'm not blocked")
wait <- struct{}{}
return nil
})
println("Ok let's play")
time.Sleep(1 * time.Second)
close(ch)
g.Wait()
println("It's over already?")
<-wait
<-wait
}
In the code below the order is always right and the program still prints the same result.
This program has a race: the second call to Go races with Wait. Consider: the goroutine created by the first Go function could complete (along with println, Sleep, close) before the goroutine created by the go
statement even begins to execute the second call to g.Go.
In the code below the order is always right and the program still prints the same result.
This program has a race: the second call to Go races with Wait. Consider: the goroutine created by the first Go function could complete (along with println, Sleep, close) before the goroutine created by the
go
statement even begins to execute the second call to g.Go.
I know it's not properly synchronized and I never said it is. I said that the order is right. I wouldn't expect runtime to stay idle during sleep and not execute the runnable goroutine.
I guess a race free reproducer could be:
package main
import (
"runtime"
"runtime/pprof"
"strings"
"time"
"golang.org/x/sync/errgroup"
)
func main() {
runtime.GOMAXPROCS(1)
g := &errgroup.Group{}
g.SetLimit(1)
ch := make(chan struct{})
wait := make(chan struct{}, 2)
go func() {
println("Starting task scheduler")
println("Scheduling first task")
g.Go(func() error {
<-ch
println("First task ran")
wait <- struct{}{}
return nil
})
println("Scheduling second task")
g.Go(func() error {
println("What about me?")
wait <- struct{}{}
return nil
})
}()
time.Sleep(1 * time.Second)
var b strings.Builder
pprof.Lookup("goroutine").WriteTo(&b, 1)
close(ch)
if strings.Contains(b.String(), "errgroup.go:71") {
g.Wait()
println("Game over")
} else {
println("Second task not blocked yet")
}
<-wait
<-wait
}
https://go.dev/play/p/Zg5GpZJz2bK
but I have to admit, there's probably no way to reproduce the problem in a completely race free way. If g.Go
and g.Wait
are in two different goroutines then runtime can always preempt g.Wait
in the middle and then schedule g.Go
to run and set g.wg.Add(1) while g.Wait
is running with counter equal to 0.
I guess it would be cleaner if the docs were saying explicitly that g.Wait
has to be called only after all calls to g.Go
finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race when g.Wait
and g.Go
are called concurrently.
I guess it would be cleaner if the docs were saying explicitly that
g.Wait
has to be called only after all calls tog.Go
finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race wheng.Wait
andg.Go
are called concurrently.
It's fine to call Go concurrently with Wait, and indeed useful, if one item of work might add others to the queue. But you can't use SetLimit in this scenario or else the Group will stop accepting work. We could certainly document that, but we should not try to disallow concurrent calls to Go.
I guess it would be cleaner if the docs were saying explicitly that
g.Wait
has to be called only after all calls tog.Go
finish. This is not obvious without looking into the implementation. One could imagine an implementation that does not have a data race wheng.Wait
andg.Go
are called concurrently.It's fine to call Go concurrently with Wait, and indeed useful, if one item of work might add others to the queue. But you can't use SetLimit in this scenario or else the Group will stop accepting work. We could certainly document that, but we should not try to disallow concurrent calls to Go.
It is a very unintuitive requirement which is really driven by implementation details not the best API design but you're right.
It is a very unintuitive requirement which is really driven by implementation details not the best API design
I think your criticism of the design is unfair. The doc comment for SetLimit says that it "limits the number of active goroutines", and that "any subsequent call to the Go method will block ...". The alternative that you propose would remove the blocking behavior, which would be an incompatible change (since it removes happens-before edges), and it would require that Group maintain an arbitrarily large queue of functions provided to Go before a goroutine is available to run them, which could cause an application to use unbounded memory when dealing with very long streams of tasks.
It is a very unintuitive requirement which is really driven by implementation details not the best API design
I think your criticism of the design is unfair. The doc comment for SetLimit says that it "limits the number of active goroutines", and that "any subsequent call to the Go method will block ...". The alternative that you propose would remove the blocking behavior, which would be an incompatible change (since it removes happens-before edges), and it would require that Group maintain an arbitrarily large queue of functions provided to Go before a goroutine is available to run them, which could cause an application to use unbounded memory when dealing with very long streams of tasks.
So I'm not advocating for my change any more at all. My criticism is about the fact that calling Wait
and Go
concurrently with no synchronization is a data race and the docs don't say a word about it. Calling Go
from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can call Go
concurrently with Wait
only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after either Go
or Wait
finishes first - or both.
My criticism is about the fact that calling
Wait
andGo
concurrently with no synchronization is a data race and the docs don't say a word about it. CallingGo
from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can callGo
concurrently withWait
only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after eitherGo
orWait
finishes first - or both.
True, but the exact same principle applies to plain old WaitGroups: to add an item to the group you call Add(1) and later Done. In the simple case you make all the Add calls in sequence and all the Dones happen asynchronously. In more complex cases a task begets more tasks, so it calls Add (for the child) before calling Done for itself, preserving the invariant. But you get into trouble if you make the Add asynchronous to the Wait without otherwise ensuring that the semaphore is nonzero.
Perhaps we could add some clarifying documentation.
My criticism is about the fact that calling
Wait
andGo
concurrently with no synchronization is a data race and the docs don't say a word about it. CallingGo
from a goroutine running inside a group is a form of synchronization. The requirement is really specific - "You can callGo
concurrently withWait
only if you have guaranteed that at least 1 other goroutine that is already running inside the group won't finish until after eitherGo
orWait
finishes first - or both.True, but the exact same principle applies to plain old WaitGroups: to add an item to the group you call Add(1) and later Done. In the simple case you make all the Add calls in sequence and all the Dones happen asynchronously. In more complex cases a task begets more tasks, so it calls Add (for the child) before calling Done for itself, preserving the invariant. But you get into trouble if you make the Add asynchronous to the Wait without otherwise ensuring that the semaphore is nonzero.
Perhaps we could add some clarifying documentation.
Yes but WaitGroup has the following in the docs:
Note that calls with a positive delta that occur when the counter is zero must happen before a Wait. Calls with a negative delta, or calls with a positive delta that start when the counter is greater than zero, may happen at any time. Typically this means the calls to Add should execute before the statement creating the goroutine or other event to be waited for. If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned.
which gives a user a chance to use the API correctly without looking into its implementation.
BTW I came up with this issue because I found in the codebase I'm working on the following code outside of group goroutines:
if !w.g.TryGo(f) {
go w.g.Go(f)
}
and it seemed fishy from the concurrency point of view so I kept digging. Apparently someone was misled by errgroup.Group docs/API.
Reopening as a doc change request.
(Edit: skip down to https://github.com/golang/go/issues/70284#issuecomment-2470418828; this is now a doc change request. --@adonovan)
Go version
go version go1.23.1 darwin/amd64
Output of
go env
in your module/workspace:What did you do?
When following code is run:
https://go.dev/play/p/xTIsT1iouTd
What did you see happen?
The program printed:
What did you expect to see?
The program printing: