Open changkun opened 1 year ago
Note that sync
cannot import context
.
Note that sync
could not import context
because context use sync.Mutex
. This is not an absolute constraint because context
can switch to a different mutex lock to avoid importing sync
package.
Having a function just named "WithContext" in package sync seems not great, since it would be called like g, ctx := sync.WithContext(ctx)
.
That's a good point. However, naming can be revised in other liked forms, the core idea is to include errgroup.WithContext
. Other naming possibilities can be NewErrGroup
, e.g., g, ctx := sync.NewErrGroup(ctx)
.
Personally I don't think the low-level sync package is the right place for the high level errgroup concept. Let's not assume that x/sync has to match to the standard library sync.
I think this is a common enough need and has proven its worth. Most users shouldn’t use the original WaitGroup anymore. My only question would be if there should be a sync/v2 with this plus generics for Map etc.
@ianlancetaylor I am also fine if it is a sub package of sync sync/errgroup
, and then the API signature doesn't need any rename, etc. 😃
The primary purpose is to get it into stdlib, and the naming can be decided into whatever community likes.
If the decision is to have a standalone package, an emphasis comment in the WaitGroup can increase the public exposure of errgroup.
With the addition of errors.Join
in Go 1.20, would it make sense for WaitGroup.Wait
to promise to return a "join" of all of the errors returned across all of the functions?
I don't think it must necessarily use errors.Join
directly, but more that whatever value it does return could support the same errors.Is
idiom that the errors.Join
result supports, so that the caller can recognize potentially many distinct errors, which I believe just means the error type having a method Unwrap() []error
that returns all of them.
I had been joining errors in my errutil.ExecParallel helper, which uses my pre-Go 1.20 multierror type. I'm not sure though if it would work as well for errgroup.
Hi, original errgroup
author here. 👋
I would not want errgroup
to enter the standard library without some API changes. It has served us well, but now we have the benefit of hindsight.
There are two significant problems with the API:
An errgroup.WithContext
today cancels the Context
when its Wait
method returns, which makes it easier to avoid leaking resources but somewhat prone to bugs involving accidental reuse of the Context
after the call to Wait
.
The need to call Wait
in order to free resources makes errgroup.WithContext
unfortunately still somewhat prone to leaks. If you start a bunch of goroutines and then encounter an error while setting up another one, it's easy to accidentally leak all of the goroutines started so far — along with their associated Context
— by writing
return err
instead of
cancel()
g.Wait()
return err
Those can both be remedied by adding a method Stop
that cancels the Context
and also waits for in-flight goroutines to exit:
g, ctx := errgroup.New(ctx)
defer g.Stop()
…
if err != nil {
return err
}
Ideally, that would be accompanied by an improvement to the lostcancel
check in cmd/vet
to report missing calls to Stop
, and/or a finalizer that verifies that Stop
has been called by the time the group becomes unreachable.
@apparentlymart, I do not think it would be appropriate for errgroup
to join the errors together. Generally only the first error matters, because it causes the Context
to be canceled. After that happens, it can be difficult to tell whether subsequent errors are errors in their own right, or secondary effects of that cancellation.
Also, even on error and canceled context it still creates remaining goroutines.
Consider this example with 10k batch.
errg, ctx := errgroup.WithContext(context.Background())
items := []int{1, 2, 3, ...... 10000}
for _, item := range items {
item := item
errg.Go(func() error {
if err := process(ctx, item); err != nil {
return fmt.Errorf("something went wrong while processing item %d", item)
}
return nil
})
}
err := errg.Wait()
Usually simple fix would be to check for canceled context. But this doesn't always look obvious.
errg, ctx := errgroup.WithContext(context.Background())
items := []int{}
loop:
for _, item := range items {
item := item
select {
case <-ctx.Done():
break loop
default:
}
errg.Go(func() error {
if err := process(ctx, item); err != nil {
return fmt.Errorf("something went wrong while processing item %d", item)
}
return nil
})
}
err := errg.Wait()
@bcmills Do you have thoughts on how can api look to bake this cancellation into errgroup package?
Hi @anjmao if I'm understanding right what you are proposing, it's the same as https://pkg.go.dev/github.com/sourcegraph/conc/pool#ContextPool.WithCancelOnError. Maybe conc's API can be used as an inspiration for this.
WithCancelOnError configures the pool to cancel its context as soon as any task returns an error or panics. By default, the pool's context is not canceled until the parent context is canceled.
@jimen0 errgroup package already works like that and cancels context when created using WithContext. The issue that even if your context is canceled errgroup.Go will spawn new goroutine and run your function.
I do not think it would be appropriate for errgroup to join the errors together. Generally only the first error matters, because it causes the Context to be canceled. After that happens, it can be difficult to tell whether subsequent errors are errors in their own right, or secondary effects of that cancellation.
I agree, errgroup.Wait()
should definitely return the first error, but it's also reasonable to provide a way to get all of the errors too, perhaps via Errors() []error
. This is very much useful, in all the ways that Promise.allSettled
is useful in Javascript. Several of the errors would just be context.Canceled
or similar if WithContext
is used, that would be fine in practice. If I ask the errgroup for all the errors it's possible to give me, I'm happy to get what I get.
Re: @bcmills
I do not think it would be appropriate for errgroup to join the errors together. Generally only the first error matters, because it causes the Context to be canceled.
That is only true if you construct the errgroup using errgroup.WithContext
. If you use the zero value new(errgroup.Group)
, then all errors are meaningful. (And in fact I found this issue searching for a way to get all errors back in exactly this situation.)
To thread that needle, perhaps it could return errors.Join
of all errors such that errors.Is(err, context.Canceled)
returns false.
To thread that needle, perhaps it could return
errors.Join
of all errors such thaterrors.Is(err, context.Canceled)
returns false.
That's an interesting suggestion, but I'm not sure how well it would work out in practice — errors that stem from a Context cancellation don't necessarily wrap context.Canceled
, and it can be very difficult for libraries that propagate those errors to reliably convert them back.
For example, a function that performs a network operation might use context.AfterFunc
and net.Conn.SetDeadline
to cancel it, resulting in an error wrapping syscall.ETIMEDOUT
instead of context.Canceled
.
@bcmills good point.
Strictly speaking, if you returned a composite, ordered error, anyone who wanted could pick out just the first error, but that's a common enough use case that it'd be annoying to have to do the dance.
Yet another perspective is that Wait could return the first error when there is a cancelable context set up, but an errors.Join of all errors when there isn't. That should track the likely meaningfulness of the errors.
Strictly speaking, if you returned a composite, ordered error, anyone who wanted could pick out just the first error, but that's a common enough use case that it'd be annoying to have to do the dance.
That, and worse: if someone has a very long-lived errgroup
, collecting all of the errors when only one is wanted could cause a significant memory leak (to store errors that are ultimately going to be ignored).
Yet another perspective is that Wait could return the first error when there is a cancelable context set up, but an errors.Join of all errors when there isn't.
That would be possible, but it seems a little too magical to me. If we're going down this road, I would rather have an explicit call or parameter to toggle the behavior.
If we're going down this road, I would rather have an explicit call or parameter to toggle the behavior.
Works for me. It'd be much nicer than my current code that separately tracks and juggles a slice of errors...
The cleanest alternative right now is a defer
, I think:
errs := make(chan []error, 1)
errs <- nil
saveErr := func(p *error) {
if *p != nil {
errs <- append(<-errs, *p)
}
}
g.Go(func() (err error) {
defer saveErr(&err)
…
})
My own rendezvous.WaitFirstError(... TaskCtx)
(designed independently before discovering this thread) has the following features:
errgroup.WithContext
)errors.Join
ctx.Err()
) is reported as the first error as that error may have triggered the others in cascade. This ensures that if the parent context has been canceled/timeout and at least an error occurred then errors.Is(err, context.Canceled /* or Timeout*/)
is true
, but if the context has been canceled/timeout but no error occurred in task (task succeeded or cancellation occurred even before the start) then errors.Is(err, context.Canceled /* or Timeout*/)
is false
.@bcmills wrote:
That's an interesting suggestion, but I'm not sure how well it would work out in practice — errors that stem from a Context cancellation don't necessarily wrap context.Canceled, and it can be very difficult for libraries that propagate those errors to reliably convert them back.
That's the reason for my last feature.
I often find myself using an errgroup
where Go
/TryGo
and Wait
runs in different goroutines, when there's an unknown set of tasks:
for e := range events {
if !group.TryGo(task) {
// Here I need to know that the task couldn't be started
// I might e.g. need a cancellation handle for ad-hoc task spawning
}
}
In another goroutine:
<-ctx.Done()
group.Wait()
// All tasks should be complete.
With current x/sync
functionality, reuse is possible (is this a bug?):
g, ctx := errgroup.WithContext(context.Background)
g.Wait() // Done, right?
g.TryGo(fn) // Reports true and runs the goroutine, after it's waited. Risk of leaks
So I'm very much in favor of a Stop
method, especially if it can prevent accidental group reuse:
TryGo
after a group is stopped returns false forever.Go
after stop should probably panic?As an aside, would it be wise to break out and remove the semaphore features (SetLimit
) for simplicity? Concurrency is notoriously difficult already, so the bar for complexity should imo be extremely high.
My 2 cents: the problem of managing a long-running pool of workers seems very different to me than what errgroup
was designed to do. I think of errgroup
as more of a Go implementation of Javascript's Promise.All
than a worker pool. Trying to bend it to support a long-running for
loop in a server process seems to conflate two issues and unnecessarily complicate the simple concept encapsulated by this package.
I'm for it. Also (not sure it's very relevant information): this proposal was just mentioned by Yarden Laifenfeld in GopherCon Israel and the vibe in the room was of general consensus to add this to sync
as well instead of keeping it in x
since it seems so useful (even if it's rather high-level, it seems like it's the "one way to solve it" which Go likes).
As briefly discussed in https://github.com/golang/go/issues/56102#issuecomment-1317765390, I propose to promote
errgroup.Group
to thesync
package. The proposed API set is listed below.Rationale
Compared to
sync.WaitGroup
,errgroup.Group
do not require cognitive load to manageAdd()
andDone()
and can easily manage the number of concurrent tasks usingSetLimit
. For example,vs.
Tu et al. [1] also reported that
WaitGroup
is often misused and causes concurrency bugs. For instance, an example taken from Figure 9:[1] Tu, Tengfei, et al. "Understanding real-world concurrency bugs in go." Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 2019. https://doi.org/10.1145/3297858.3304069
Existing Usage
A search over GitHub, code that uses
errgroup.Group
includes 16.1k files and pkg.go.dev shows there are 10,456 imports.APIs
Update:
WithContext
is renamed toNewErrGroupWithContext
.