brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.65k stars 114 forks source link

What's the status of concurrent version? #143

Closed kt0 closed 8 years ago

kt0 commented 8 years ago

Is there any known issue with concurrent version?

I've tested with high load and it working great (way better comparing to celluloid), but unlike celluloid version it doesn't gracefully shutdown the thread over app exit. We are using sucker_punch to push websocket messages after request completed asynchronously, and also have unicorn + unicorn-worker-killer along the way.

I've haunt down the problem and reached to this documentation, it seems concurrent-ruby is terminating pool on at_exit if auto_terminate wasn't set to false (default is true).

I'm not sure it is intended behaviour for sucker_punch or just the default?

kt0 commented 8 years ago

I've created a PR for this, https://github.com/brandonhilkert/sucker_punch/pull/144

jdantonio commented 8 years ago

Please define "doesn't gracefully shutdown the thread over app exit." The at_exit handler in concurrent-ruby does shutdown the thread pool on exit. That's what it's there for.

kt0 commented 8 years ago

@jdantonio By at_exit I mean ruby process at exit time and Kernel.at_exit (I'm not a long time rubist!), the case here is unicorn killing a worker, and ThreadPollExecuter suddenly terminating threads, therefore jobs aren't done, adding block to at_exit queue is still there but is shutdown. In celluloid it worked correctly.

BTW, I was in process of creating issue on concurrent-ruby to ask what is auto_terminate and is it safe to set it false?

kt0 commented 8 years ago

I've managed to get it working with Concurrent::AtExit, but it's a bit hacky, SuckerPunch::Queue register itself to AtExit in module construction before concurrent get time to attach itself.

require 'concurrent/utility/at_exit'

module SuckerPunch
  class Queue
    # ...
    ::Concurrent::AtExit.add{self.shutdown}

    def self.clear
      self.shutdown
      QUEUES.clear
      SuckerPunch::Counter::Busy.clear
      SuckerPunch::Counter::Processed.clear
      SuckerPunch::Counter::Failed.clear
    end

    def self.shutdown
      QUEUES.each_pair do |_, pool|
        pool.shutdown
        pool.wait_for_termination
      end
    end
    # ...
  end
end
jdantonio commented 8 years ago

I've never used unicorn but I think I understand your issue. And, yes, calling #shutdown on each thread pool is probably the best option.

How to handle thread pools at application exit has been topic of much debate. Personally, I dislike the practice of killing threads. There is no way to predict what state a running thread is in or what resources it is accessing (I've written about this before but it's a common enough question that I should write a blog post). One problem is what to do about a thread that is hopelessly blocked and unable to recover. I consider that to be an application problem and not the responsibility of concurrent-ruby. The bigger problem, and the one germane to this discussion, is that killing threads on exit is what MRI does (see sample programs below). After much discussion we decided that the best practice for concurrent-ruby is to have thread pools behave at application exit in a manner consistent with MRI Ruby, even if we think that behavior is wrong. Subsequently, when the application exits we kill the threads. :disappointed:

That being said, the best practice for asynchronous applications--in any language--is to ensure that all in-process tasks are handled properly when the application exits. This is the reason the JVM behaves the way I explained over here. The JVM forces the programmer to explicitly handle all in-process jobs before shutting down. For someone new to concurrent programming the behavior of the JVM can be surprising but it's actually a much safer practice.

That being said, my recommendation is for Sucker Punch to add one or more configuration options regarding application exit. Then during initialization create the necessary at_exit handlers to ensure all its thread pools are shut down appropriately. In that case, setting auto_terminate: false would be correct.

PS: I recommend always passing a timeout value to #wait_for_termination. Without a timeout value it will block indefinitely, which could be forever.

Sample Code

For fun, run the following code on MRI and JRuby and see how they behave. Then uncomment the sleep statement and see what happens. Without the sleep statement blocking the main thread the script will kill the threads and exit and none of the "Thread x..." print statement will run. With the sleep statement everything will run as expected. This behavior will be the same on both MRI and JRuby.

#!/usr/bin/env ruby

puts "Spawning threads..."

1000.times do |i|
  Thread.new do
    sleep(10)
    print("Thread #{i} exiting\n")
  end
end

#sleep(15)

puts "Exiting program."

Now try this code. On MRI the program will exit gracefully after the sleep has expired. All the jobs will have run. On JRuby, the behavior will be different. All the jobs will run and the sleep statement will expire. The console will display "Exiting program." and then the application will hang. Forever. The JVM will not shutdown. You will have to kill the process manually. :sob:

#!/usr/bin/env ruby

require 'concurrent'

puts "Spawning threads..."

options = {
  min_threads: 1000,
  max_threads: 1000,
  auto_terminate: false
}
pool = Concurrent::ThreadPoolExecutor.new(options)

1000.times do |i|
  pool.post do
    sleep(10)
    print("Thread #{i} exiting\n")
  end
end

sleep(15)

puts "Exiting program."
openface commented 8 years ago

At the risk of stating the obvious, I think there are 3 main "at exit" scenarios in the application space that should be accommodated by way of a configuration option.

  1. Hard termination - where all jobs are terminated (including those currently running or queued).
  2. Soft termination - queued jobs are terminated, but currently running jobs are allowed to complete.
  3. No termination - queued and running jobs are allowed to complete.

Ideally, this level of configuration could be set in an initializer for suckerpunch.

brandonhilkert commented 8 years ago

@kt0 re: status: the following needs to be done before beta is released: https://github.com/brandonhilkert/sucker_punch/pull/145

@jdantonio Thanks for chiming in. As always, I appreciate your feedback. Sounds like there could be a variety of needs and the default in sucker_punch could be the case that @kt0 describes.

@openface Thanks for the summary, I agree with your assessment. These seem to be the most common scenarios, 2 being the sane default IMO.