Open mikegleasonjr opened 1 month ago
Hi @mikegleasonjr!
Yes, the behaviour you are describing is correct and it's by design. Unlike the StopAndWait()
method on the worker pool, the context group's Wait()
method waits until either one of the tasks exited with error
or all tasks completed successfully (nil error).
These two methods are meant to be used in different circumstances. Given that worker pool instances are meant to be reused across all goroutines (singleton), the StopAndWait()
method is usually invoked when tearing down the application (e.g. after receiving the exit or kill signal). In this scenario, the goroutine that executes the shutdown procedure doesn't really expect to handle an error in one of the tasks that have been sent to the pool, all it does is wait for any pending tasks to complete (succesfully or not). This is why this method doesn't return an error
object.
The context group's Wait()
method, on the other hand, is meant to be called after submitting a batch of tasks that are inter-related (e.g. uploading a collection of images within the handler of a single HTTP request) and that usually also means they share a context.Context
. The semantics of this method are inspired by the errgroup
package. When dealing with a bunch of tasks that are tied to the same context, you usually want to fail fast and return as soon as the first error is thrown. Moreover, if the group's context.Context
object gets cancelled, then any pending task is not executed.
That said, there's another kind of "group of tasks" that behaves in the way you described and that's the one created with pool.Group()
. If you create the group using this method, then the Wait()
method will wait for all tasks to complete, regardless of any error. However, this kind of task group doesn't share a context.Context
object and task functions cannot return error
(each task is expected to handle errors internally).
Given this test:
It has the following output:
If we uncomment
pool.StopAndWait()
ortime.Sleep(100 * time.Millisecond)
, the test passes:What should happen:
What happens:
group.Wait()
is not waiting forjob 3
.Side effects:
If
job 3
creates and returns resources that must be freed up (like anio.Closer
), it causes a memory leak because the pool manager never knew it ended and returned something. It is not up to the worker to check if the pool stopped before returning its value to the pool and do cleanup.What should happen:
group.Wait()
should behave likepool.StopAndWait()
(minus the stop)Temporary fix: