Closed bdewater closed 7 years ago
hey @bdewater - sorry for the lack of response. Been down for the count with the stomach flu. I plan to take a look today. Will be in touch 👍
@bdewater setting
workers 1
in the SendJob
should fix it up. Let me know.
@jdantonio Wondering if you could chime in...
Sucker Punch Queues default to 2 workers. I think most people leave it as is, but anything over 1 would be susceptible to an issue attempting to empty the queues upon shutdown.
When the CountDownLatch
is posted to each queue, it's pushed to the end of the queue. This will only properly wait to all other jobs before it have completed if there is 1 worker for each queue. Let's say there are 2 workers in a queue and 1 job is currently running, but yet to complete (let's say there's a 10 sec. sleep in it), the CountDownLatch
is posted to the queue and picked up by the idle worker (2nd worker) and executed immediately.
I think if a queue has 100 jobs in it and 2 workers running, it would try to exhaust all of them (assuming they execute within the shutdown_timeout
) and the worst case is the last job will be cut off.
I'm wondering if there's a better strategy to make sure the CountDownLatch
job isn't executed until ALL other jobs have finished, even run by other workers. First thought was to adjust the number of workers on the fly to 1 and then post the job, but I'm unsure what would happen to any active workers when that call is made.
The following code exits immediately:
require 'sucker_punch'
class FakeJob
include SuckerPunch::Job
def perform
sleep 2
puts "inside FakeJob"
end
end
FakeJob.perform_async
[02:13:48] bhilkert [~/Desktop] $ ruby sp.rb
The following code will properly wait b/c the latch is run AFTER the enqueue job:
require 'sucker_punch'
class FakeJob
include SuckerPunch::Job
workers 1
def perform
sleep 2
puts "inside FakeJob"
end
end
FakeJob.perform_async
[02:13:58] bhilkert [~/Desktop] $ ruby sp.rb
inside FakeJob
@bdewater Can you try the alt-shutdown
branch and let me know if that works as expected?
@jdantonio / @pitr-ch - Do you think this is a reasonable approach? https://github.com/brandonhilkert/sucker_punch/commit/70aa7bd0cbc7e57aa328fdfb33801339f1cb27e1
@brandonhilkert I apologize for the late response, I've been out of town for work.
This will only properly wait to all other jobs before it have completed if there is 1 worker for each queue.
As soon as I saw this I realized you were correct. This is a legit bug. :-( I haven't had a chance to look at the suggested fix yet, but your description sounds solid. I'll look at it more deeply once I get home.
Fixed in 2.0.3
Thanks everybody, and sorry for my own delay in responding (vacation). I've tried 2.0.3 in the Rails app and it works! 🙇
@jdantonio do you mean a bug in concurrent-ruby and if so, do you want me to open an issue?
@bdewater No, a bug here. I implemented the shutdown code in sucker_punch when @brandonhilkert moved to c-r. My algorithm here was fundamentally wrong, as you figured out. 😊
Hi! I've run into an interesting issue while upgrading a Rails app from Sucker Punch 1.x to 2.0. The tl;dr is that during a staging test Airbrake's (4.3.8) test rake task did not send a notification. After investigation it seems the Ruby process quits while the job isn't finished.
The output with SP 1.2:
With 2.0:
First I copied over AirBrake's
SendJob
with a different name to the app directory so I could stick abinding.pry
in there. With astep
and a fewnext
calls it ended up sender.rb on line 54 just before doing the HTTP post where it mysteriously exited.I ran it again with
byebug --no-quit
for post-mortem debugging and all of a sudden the POST went through just fine:The
(byebug:ctrl)
prompt prefix indicates execution has ended (docs). A split second after displaying that prompt the[Airbrake] Success: Net::HTTPOK
message is outputted, so that's why thequit
command is not prefixed by the prompt.Trying to confirm my theory the process is ended too fast, I reconfigured Airbrake as such:
and it works as expected.
In https://github.com/brandonhilkert/sucker_punch/issues/188#issue-191922704 it is suggested to run jobs inline from rake tasks (which works too for my case) but I was under a similar impression SP would wait a little (
shutdown_timeout
seconds) while before quitting?