dominikh / go-tools

Staticcheck - The advanced Go linter
https://staticcheck.dev
MIT License
6.12k stars 371 forks source link

staticcheck: errgroup context usage #155

Open benesch opened 7 years ago

benesch commented 7 years ago

We recently had a bug in CockroachDB (https://github.com/cockroachdb/cockroach/pull/17453) where we would sometimes return the wrong error when using an error group. Here's a simplified example:

func work(ctx context.Context, ops chan struct{/* ... */}) error {
    g, gCtx := errgroup.WithContext(ctx)

    for op := range ops {
        select {
        case <-gCtx.Done():
            // XXX NOT CORRECT
            return gCtx.Err()
        }
        g.Go(func() error {
            // Do some work.
            select {
            case <-gCtx.Done():
                // This is actually correct, because
                // g.Wait() will remember the first error
                // and return it, discarding this one.
                return gCtx.Err()
            }
            // Do some more work.
        })
    }

    if err := g.Wait(); err != nil {
        return err
    }
}

Basically, if a spawned goroutine fails before we've finished spawning all goroutines—quite likely, because operations come through the ops chan slowly—we want to avoid spawning any future goroutines and return the error from the failed goroutine. The above code returns context.Canceled instead of the useful error that caused the context to be canceled. Here's one solution:

     for op := range ops {
         select {
         case <-gCtx.Done():
-            // XXX NOT CORRECT
-            return gCtx.Err()
+            return g.Wait()
         }
         g.Go(func() error {
             // Do some work.

It's really easy to miss this in code review. The correct error-handling code looks very similar to the incorrect error-handling code... and the incorrect error-handling code is correct within g.Go. I think including this in staticcheck would be rather straightforward, though, as any call to gCtx.Err() outside of a g.Go func is a usage error. I can't think of any case where it wouldn't be an error, at least. @dominikh, would you endorse a lint for this?

/cc @tamird who might be willing to implement this check.

dominikh commented 7 years ago

I haven't given any thought to feasibility yet (is it easy to determine which error is correct, given the context? For a machine?¹) but I am not opposed to such a check.

¹: If "any call to gCtx.Err() outside of a g.Go func is a usage error" holds true, then yes, it is possible. I haven't verified that statement yet.