alitto / pond

🔘 Minimalistic and High-performance goroutine worker pool written in Go
MIT License
1.5k stars 65 forks source link

Closing a pool early does not trigger the panic handler. #5

Closed polds closed 4 years ago

polds commented 4 years ago

I'm playing around with this library, which looks very promising, and ran into an issue where the panic handler doesn't get called if you close the pool early. My usecase being trying to timeout a worker pool. However when I "timeout" it straight up panics with a panic: send on closed channel

This is my sample code:

func h(p interface{}) {
   fmt.Printf("Catching panic: %v\n", p
}

func main() {
  p := pond.New(5, 0, pond.PanicHandler(h))
  // Simulate a timeout event of some sort.
  time.AfterFunc(time.Second, func() {
    p.StopAndWait()
  })
  for i := 0; i < 100; i++ {
     i := i
     p.Submit(func() {
        fmt.Printf("Handling task %d\n", i)
        time.Sleep(250 * time.Millisecond)
     })
  }
  p.StopAndWait()
}
alitto commented 4 years ago

Hey @polds!

A couple of comments to explain the behaviour you're seeing:

  1. The pool created in your sample is a "blocking pool" because maxCapacity is set to 0 (2nd. argument). This means the buffered channel used to queue tasks submitted to the pool has size 0, so every time a client goroutine does a Submit() on the pool, it will block until a worker grabs it from the queue (receives from that channel).
  2. Given that the pool is blocking, when the main goroutine tries to send the 6th task, it'll wait until one of the workers becomes available again. When p.StopAndWait() is called from the timeout function, the task queue channel is closed, so the when the main goroutine attempts to continue submitting tasks it panics because of this (panic: send on closed channel). Notice this panic is thrown in the caller goroutine, not inside one of the tasks. This is why the panic handler is not called (it's only meant to catch panics inside task functions).
  3. The StopAndWait() method is intended for a graceful termination of the pool (i.e. stop accepting tasks but finish all running and queued tasks), but looks like you need it to stop immediatelly and discard any queued/pending tasks. If that's the case, you can use Stop() instead, which has that behaviour.

Here's a modified version of your sample that might behave similar to what you expect:

package main

import (
    "fmt"
    "time"

    "github.com/alitto/pond"
)

func h(p interface{}) {
    fmt.Printf("Catching panic: %v\n", p)
}

func main() {
        // Create a non-blocking channel (maxCapacity = 100)
    p := pond.New(5, 100, pond.PanicHandler(h))
    quit := make(chan struct{})
    // Simulate a timeout event of some sort.
    time.AfterFunc(time.Second, func() {
                // Stop the pool immediately and discard queued tasks
        p.Stop()
        quit <- struct{}{}
    })
    for i := 0; i < 100; i++ {
        i := i
                // Attempt to submit the task or silently discard it if the pool is stopped 
        p.TrySubmit(func() {
            fmt.Printf("Handling task %d\n", i)
            time.Sleep(250 * time.Millisecond)
        })
    }
    <-quit
}

Alternatively, if you need to set a max waiting time for each task to be picked up by a worker you could use SubmitBefore() instead of Submit(), which only executes the task if it was grabbed within the specified time and skips it otherwise.

Thanks for reaching out! Please let me know if that works for you.

polds commented 4 years ago

Thanks for the response! I'm looking for a way to expire the entire queue as a whole after a timeout, and not necessarily individual tasks.

Also, thanks for your example code. It unfortunately only ever processes maxWorkers number of items and blocks until the timeout kicks in regardless of the maxCapacity value set. It also has the issue if it completes before the timeout it'd wait until the timeout anyways.

What I'm ultimately looking for is a way to throw 1,000s of tasks at a queue, batch them in groups of 50 or whatever and drop whatever doesn't complete within the timeout (to be rescheduled at a later date).

We can close this issue, though. Since my original point was about the panic handler not doing what I thought it should be.