Closed brandonhilkert closed 8 years ago
You'll have to manage in-queue jobs within Sucker Punch.
The difference between #kill
and shutdown
is how they handle thread exit. Neither method allows jobs in the queue to be processed. Decisions about the queue are deliberately left to the application. There are too many possible states too many possible options for them to be effectively handled by the thread pool itself.
The difference between the two functions is that #kill
attempts a language-level kill of every thread in the pool, regardless of its operating state. The #shutdown
method allows each thread to finish its current job but instructs them to not grab any more jobs from the queue. The risk of #kill
is that threads may be killed while the system is in an inconsistent or intermediate state. If the thread is connected to external resources the connections may be immediately severed. The advantage of #kill
is that the pool should stop very quickly. The risk of the #shutdown
method is that one or more long-running jobs may "hang" the shutdown process for an indeterminate amount of time. There is no way to know when those jobs may complete. So #shutdown
should never leave the system in an inconsistent state but shutdown may take a long (and nondeterministic) time.
The most thorough practice for shutting down a thread pool is to combine the two methods:
#shutdown
to start an orderly cessation of processing#wait_for_termination
with a reasonable timeout#kill
to brutally kill the stragglers
#wait_for_termination
again with a shorter timeout to let the kill finishTo manage the remaining enqueued jobs I recommend setting a global on/off flag and have your job posting function check this flag. Combine this with a special "stop" job to indicate that the thread pools are done.
CountDownLatch
My last comment I promise! :-)
The reason these issues didn't come up with Celluloid is that it's a very opinionated framework. The implementers had to deal with these issues, too. But to make life simpler for their users they decided what to do in these cases and implemented it that way. This is great so long as everything works properly, but when it doesn't you are at the mercy of Celluloid (#132, #135, etc.).
Conversely, concurrent-ruby is deliberately unopinionated. We ask our users to make these decisions for themselves and provide the tools necessary to implement your application the way you want. Plus, ThreadPoolExecutor
is a low-level abstraction (much lower lever than Concurrent::Future
, Concurrent::Actor
, Concurrent::Agent
, etc.). So there is a trade-off.
But it's important to understand that these issues exist when shutting down any concurrent application written in any language. The only real difference is the toolset and the options it gives you.
@jdantonio Makes sense. Assuming we want to allow the following:
# Currently running jobs are allowed to complete, but queued jobs are discarded
SuckerPunch.shutdown_mode = :soft # DEFAULT
# All jobs are terminated immediately (both currently running and queued)
SuckerPunch.shutdown_mode = :hard
# Shutdown is blocked until both running and queued jobs complete
SuckerPunch.shutdown_mode = :none
If I understand correctly, this maps to like:
:soft => pool.shutdown
:hard => pool.kill
:none => pool.wait_for_termination
I hear everything you're saying about giving the pools opportunities to shutdown and sending multiple signals to it. Just want to provide the basics for now. Do you see any issue with the above?
When I ran the script in the none
mode (wait_for_termination
), when it exhausted the queue, the following was triggered:
Executing job 499
Executing job 498
/Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `sleep': No live threads left. Deadlock? (fatal)
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `wait'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:43:in `ns_wait'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/abstract_lockable_object.rb:43:in `ns_wait_until'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/atomic/event.rb:67:in `block in wait'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `block in synchronize'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/atomic/event.rb:64:in `wait'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/executor/ruby_executor_service.rb:49:in `wait_for_termination'
from /Users/bhilkert/Dropbox/code/sucker_punch/lib/sucker_punch/shutdown_mode.rb:21:in `shutdown'
from /Users/bhilkert/Dropbox/code/sucker_punch/lib/sucker_punch/job.rb:67:in `shutdown'
from /Users/bhilkert/Dropbox/code/sucker_punch/lib/sucker_punch/job.rb:26:in `block in included'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:72:in `call'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:72:in `block in run'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:70:in `each'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:70:in `run'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:88:in `runner'
from /Users/bhilkert/.rbenv/versions/2.2.0/lib/ruby/gems/2.2.0/gems/concurrent-ruby-1.0.0/lib/concurrent/utility/at_exit.rb:49:in `block (2 levels) in install'
Thoughts?
A call to #wait_for_termination
must occur after a call to either #kill
or #shutdown
. PR #147 is a spike of what I was trying to describe above. It needs some work to be production-ready, but I think (hope) it demonstrates the ideas. It can be run against the bin/shutdown
script as demonstration.
@jdantonio Thanks so much for the example. Seeing code first hand always help me wrap my head around it.
I want to take a step back from what I was doing with the modes. Part of why I think Sucker Punch is valuable to the community, especially those with little/no multi-threading experience is it removes these hairy details. Prior to seeing the Celluloid output about not shutting down properly, there wasn't a single question about how threads exited and the effect of those jobs currently in-process.
With that in mind, I'd like to provide a sane default, while also allowing those more interested in providing a custom shutdown routine. That'll keep the details away from those just looking to do simple logging or sending emails (2 most common use cases). For those with a more robust use case, they'll have the opportunity to configure any one of the modes previously discussed.
Having said all that, i'd like the default to be something like what I previously described as the soft
mode, but then exiting after 10 sec. 10 sec. is the time frame heroku uses to forcefully shutdown the application and I have a feeling Heroku is the most common host of Sucker Punch, given it's use in saving money on an additional worker dyno. I spiked out something like what you described, however when I run the bin/shutdown
script, new jobs continue to be picked up even after the shutdown
method is sent to the pool. I didn't find that to be the case before when using AtExit
. Is there something different about this implementation that would cause that?
Also, with the 10 sec. thing in mind, I don't think I can iterate through the queues and call wait_for_termination
given that the last queue in Map would be cut off before it received the method if Heroku cut the process off (similar to your comment here).
Also, out of curiosity, what's the difference in using at_exit
vs. Concurrent::AtExit.add
?
:+1: to setting reasonable defaults. That's what I've always tried to do with c-r. That's why we have at_exit
handlers, global thread pools, an internal monotonic clock, etc. Most users can simply create futures, promises, actors, etc. and not worry about the details. Advanced users have much greater control.
Concurrent::AtExit
is an internal implementation detail that I probably didn't need to mention. Under the hood it just uses at_exit
. The problems with at_exit
are 1) the order the handlers are created is important and 2) once you call it you can't change it. Internally we use AtExit
to create a registry of procs that we then run from within a single at_exit
handler. This gives us greater control internally. In your case you probably only need the one handler and the order it runs with respect to handlers registered by other gems is probably irrelevant. So in your case a single at_exit
call at the end of your initialization routine is probably sufficient--no need to delegate it back to c-r.
I'll try to take a look at your code as soon as possible. I'm heading out of town tomorrow evening for a conference so this will likely be a very busy week. My apologies in advance if I don't get to this right away. I'll do my best to help you get 2.0 out while Rails 5 is still in beta. It would be awesome to see Rails 5 able to use the updated Sucker Punch!
PS: I created the new built-in `:async' job processor and queue adapter in Rails 5 so I'm very familiar with how queue adapters work. Once 2.0 is ready I'm happy to help you update the sucker_punch adapter.
Awesome! Reading through the async adapter was one of the reasons I moved forward converting sucker punch to c-r :)
On Sunday, January 3, 2016, jdantonio notifications@github.com wrote:
PS: I created the new built-in `:async' job processor and queue adapter https://github.com/rails/rails/pull/21257 in Rails 5 so I'm very familiar with how queue adapters work. Once 2.0 is ready I'm happy to help you update the sucker_punch https://github.com/rails/rails/blob/master/activejob/lib/active_job/queue_adapters/sucker_punch_adapter.rb adapter.
— Reply to this email directly or view it on GitHub https://github.com/brandonhilkert/sucker_punch/pull/146#issuecomment-168548956 .
_Build a Ruby Gem is available! http://brandonhilkert.com/books/build-a-ruby-gem/?utm_source=gmail-sig&utm_medium=email&utm_campaign=gmail_
@jdantonio Going to merge this into the concurrent
branch for now. Would love to continue the conversation on that PR
@jdantonio - Can you suggest a way to allow currently processing jobs to complete, but not have the workers pick up new jobs?
I think I might be a little unsure of what
shutdown
will do vs. kill. Using this script, both seem to exit immediately.