Shutdown_timeout not working properly

brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.

MIT License

2.65k stars 114 forks source link

Shutdown_timeout not working properly #174

Closed brandonhilkert closed 8 years ago

brandonhilkert commented 8 years ago

#!/usr/bin/env ruby

require 'bundler'
Bundler.require(:default)

SuckerPunch.shutdown_timeout = 2

class CountdownJob
  include SuckerPunch::Job

  def perform(i)
    sleep 0.1
    print "Executing job #{i}\n"
  end
end

puts "Enqueuing 100 jobs..."

100.times { |i| CountdownJob.perform_async(i) }
sleep 0.5

@jdantonio, Whether shutdown_timeout is set to 1 or 1000, the behavior is the same.

Here's the behavior keeping the default 8 sec. wait:

shutdown

Here's the behavior setting the shutdown_timeout to 1000 (a long time):

shutdown2

It's the same, no matter the configuration.

if I comment out:

        queues.each do |queue|
          queue.post(latch) { |l| l.count_down }
          queue.shutdown
        end

The script then halts at the latch countdown, but the works stop processing everything for the timeout (default 8 sec.):

shutdown3

I'm confused as to what's going on, maybe it makes sense to you ;)

jdantonio commented 8 years ago

Well that's not the way it's supposed to work, is it? :-)

I'll have to pull the code this weekend and take a look at it more deeply. I don't see anything obvious in the code above, and I know we wrote tests for this. The first thing I'm going to look at is the at_exit handlers. I can't remember, did you turn off auto termination on your thread pools?

It's along story, but... When the main thread finishes MRI will unceremoniously kill all of its threads. The JVM, on the other hand, won't exit until all thread pools are explicitly shutdown. On JRuby we map our thread pools tp java.util.concurrent.ThreadPoolExecutor. So on JRuby when the main thread exists the JVM will hang forever unless the thread pools are shutdown. This was a real problem for out global thread pools. To solve this I used Ruby's at_exit handlers. When a c-r thread pool is initialized I register an at_exit handler designed to make MRI and JRuby behave identically. We refer to this as "auto termination" and it can be toggled on and off. Needless to say, it was a real PITA.

Whenever someone runs a console app and the exit behavior is unexpected, I first look at auto termination. I don't know why that would be a problem here--the latch should block the main thread. But that's were I'll start looking.

brandonhilkert commented 8 years ago

I did turn them off. Here's the initialization for the pools:

    DEFAULT_EXECUTOR_OPTIONS = {
      min_threads:     2,
      max_threads:     2,
      idletime:        60, # 1 minute
      max_queue:       0, # unlimited
      auto_terminate:  false # Let shutdown modes handle thread termination
    }.freeze

Thanks for digging in!

jdantonio commented 8 years ago

The culprits are lines 53 and 54 in job.rb. Note the comment "break if shutdown began while I was waiting in the queue" on line 53. The jobs are getting successfully post to the queue. When the app shuts down all the jobs in the queue are run. But all jobs in the queue are being short-circuited by the return statement on line 54. The latch in shutdown_all is being triggered because we are posting directly to the thread pool, which means it skips the short-circuit in the __run_perform function.

It's been a while since we worked on this so I can't remember exactly what the intended shutdown behavior was, but the code is pretty deliberate. It looks like it's doing exactly what it's supposed to be doing. Perhaps the intent was different?

// @eileencodes

brandonhilkert commented 8 years ago

@jdantonio Makes sense! I've removed it. I believe it was prior to having more formal shutdown behavior.

@eileencodes I've released 2.0.2. Let me know if this does what you expect.