brandonhilkert / sucker_punch

Sucker Punch is a Ruby asynchronous processing library using concurrent-ruby, heavily influenced by Sidekiq and girl_friday.
MIT License
2.64k stars 114 forks source link

Fixes a small timing bug during shutdown. #151

Closed jdantonio closed 8 years ago

jdantonio commented 8 years ago

There was a small race condition in the previous implementation. This race condition did not result in bugs given the current shutdown algorithm. However, it could lead to subtle bugs should that algorithm change. This PR imposes a small performance penalty on every job post to a queue but it is the most "correct" approach.

The first issue is that I forgot to shutdown the queues in the previous implementation. The second issue is that I removed the SuckerPunch::RUNNING.true? check from the perform_* methods. The former was an oversight and should be fixed. The latter is a deliberate performance optimization that may be unnecessary.

The missing #shutdown call (added on line 59 of sucker_punch.rb) prevents any new jobs from being post to the thread pool and ensures proper thread pool shutdown. Without this line there may be shutdown problems on JRuby, which won't shutdown the JVM with running thread pools. On MRI this is less of an issue since MRI doesn't care. I consider this change to be necessary--it's a bug.

I consider the SuckerPunch::RUNNING.true? check in the perform_async and perform_in methods to be optional. They are necessary for "correctness" but impose a small performance penalty. Without these checks there is a small possibility that jobs may sneak in before an individual queue is shutdown. With the current algorithm that isn't a problem. The __run_perform function checks the flag again and skips any jobs that may have slipped in. Thus the intended shutdown behavior is preserved. Should the check in __run_perform be removed or changed to support a different algorithm (say, to let all enqueued jobs finish) then we have a timing bug. Adding the new checks in the two perform_* methods does not allow us to remove the check in __run_perform, however. The latter check is used for clearing the queue during shutdown. It must stay.

Adding the SuckerPunch::RUNNING.true? checks to the perform_* methods adds a tiny bit of latency to every job. Concurrent::AtomicBoolean is pretty fast but the time for the check is not zero. So we have a performance vs. correctness tradeoff. I personally prefer correctness but I wanted the tradeoff to be known.

As a side note, those on MRI who are performance conscious should consider installing the optional C extensions. It contains a pure-C version of AtomicBoolean that will automatically be loaded if detected (no code changes are necessary--users simply need to install the gem). The cost of the SuckerPunch::RUNNING.true? check will be significantly less if concurrent-ruby-ext is installed. There is a benchmark script that will show the difference.

brandonhilkert commented 8 years ago

:+1: I, too, am for correctness over performance in this case. Glad to have the documentation from this post though if users are curious in the future. Great stuff man, really appreciate it.