Open johnnyshields opened 3 years ago
@albus522 can you review the code on this PR? I'd like to ask about the general direction and whether you'd consider merging this.
@albus522 what steps can I take to get this merged?
@albus522 thank you for your feedback.
Can you please take a quick look at PRs #1139 (CI fix) and #1140 (Changelog)? From there I'll be happy to break this one into a series of smaller PRs.
Update: we are now running this branch at TableCheck in production on Kubernetes, serving 100,000+ jobs per day of various kinds (mailers, report generation, payment processing, cache computation, data warehousing, etc.)
Update: we've now run this branch continually for one month with no issues.
@albus522 if you are unavailable to help get this merged, would it be possible to designate someone else in the Delayed Job community to review? I'll be glad to break this in a series smaller PRs, but I need someone to give me a reasonable amount of attention to do that.
@johnnyshields Will this new fork method also work for JRuby applications? I'm currently converting an app to use JRuby and cannot run script/delayed_job
becausedaemonize
does not work in JRuby.
If yes, then it would be nice if I can configure the paths for pids and log files using ENV or an external config file. My JRuby app will be packaged using warbler
so any writing to files must be done outside the Rails project directory... hence the need for configurability. (Not sure if delayed_job
already does this as I'm currently looking into it.)
@hmistry I don't know, you'll have to try it. If JRuby supports fork
then yes.
@hmistry you can try this gem: https://github.com/headius/forkjoin.rb
However, if daemonize doesn't work (it uses fork) then this also probably won't work, though technically this PR is a "simpler" type of forking so it might work with the gem about. Again you'll have to try it; JRuby is beyond the scope here.
You may also wish to use Sidekiq rather than DelayedJob which is multi-threaded rather than multi-process and should work well on JRuby.
@johnnyshields Got it. Thank you I'll take a look at the gem. JRuby doesn't support forking processes so based on what you said, doubt this will work. Sidekiq might be better option however it requires engineering effort... will consider it if we can't find a solution with DJ.
@albus522 do you think you might be able to spend some time on this (or designated someone else to help review)? Since we've added this patch at my company we've cut our AWS bill for Delayed Job by about $5,000 / month (running on Kubernetes). Think lots of folks will benefit from it.
@albus522 pinging on this. Do you think you'll be able to carve out some time to help review this? As stated previously I'm happy to break it into a series of smaller PRs that can be merged in sequence.
We've been using this in production for months and it has been working like a charm.
@albus522 ping on this. We've processed literally billions of jobs on this branch having very diverse workloads and its working without a hitch.
This is super useful for debugging.
@johnnyshields I'm a bit confused, how does one use --fork
and have the process in the foreground with --pools
? The forking code checks for worker_count > 1
https://github.com/collectiveidea/delayed_job/blob/dfa5561b0f220f0eb08104aee0aec876047ffa86/lib/delayed/launcher/forking.rb#L13 before it attempts to run a loop for the foreground.
But when --pools
is passed as an option, setup_workers
-> setup_pooled_workers
https://github.com/collectiveidea/delayed_job/blob/dfa5561b0f220f0eb08104aee0aec876047ffa86/lib/delayed/launcher/base.rb#L36 never incrementsworker_count
only worker_index
In addition because worker_count
is never incremented, Delayed::Worker.before_fork
is never invoked, causing
$> bin/delayed_job --fork --pools=mailers:2 start
undefined method `each' for nil:NilClass
/Development/delayed_job/lib/delayed/worker.rb:96:in `after_fork'
/Development/delayed_job/lib/delayed/launcher/base.rb:72:in `run_worker'
/Development/delayed_job/lib/delayed/launcher/forking.rb:74:in `block in fork_worker'
/Development/delayed_job/lib/delayed/launcher/forking.rb:74:in `fork'
/Development/delayed_job/lib/delayed/launcher/forking.rb:74:in `fork_worker'
/Development/delayed_job/lib/delayed/launcher/forking.rb:63:in `add_worker'
/Development/delayed_job/lib/delayed/launcher/base.rb:49:in `block (2 levels) in setup_pooled_workers'
/Development/delayed_job/lib/delayed/launcher/base.rb:49:in `times'
/Development/delayed_job/lib/delayed/launcher/base.rb:49:in `block in setup_pooled_workers'
/Development/delayed_job/lib/delayed/launcher/base.rb:47:in `each'
/Development/delayed_job/lib/delayed/launcher/base.rb:47:in `setup_pooled_workers'
/Development/delayed_job/lib/delayed/launcher/base.rb:36:in `setup_workers'
/Development/delayed_job/lib/delayed/launcher/forking.rb:12:in `launch'
/Development/delayed_job/lib/delayed/command.rb:21:in `launch'
/Development/delayed_job/lib/delayed/command.rb:26:in `daemonize'
bin/delayed_job:5:in `<top (required)>'
for each queue passed.
@chtrinh I think I never got around to completing pools and forking together, or else I never tested that. I think all other functionality besides that (queues, etc.) is well-tested. Would be glad to accept a PR to this branch.
@johnnyshields I ended up going a different way. I changed the binstub by appending loop {}
to keep the main process running in the foreground and added some Signal traps to control the daemons that were spawned.
I've managed to run delayed_job in a container (on kubernetes) by just calling bin/delayed_job run
instead of start
that way it does not fork into the background. wouldn't this be enough to do the trick?
Forking Launcher
Status
<< READY FOR REVIEW >>
I will be running this in production at TableCheck for a few weeks in the meantime where we process 100,000+ jobs of all kinds per day (mailers, SMS, payments, search indexing, etc.) across many worker servers.
script/delayed_job
(daemon mode) in productionscript/delayed_job
(forking mode) in productionWhat does this PR do?
This PR makes two "Launcher" modes for Delayed Job:
--fork
option to fork workers from parent process in the foreground."Fork mode" will only fork when using more than one worker, i.e. set by the
--number-of-workers
or--pools
args. If only one worker, it runs aDelayed::Worker
in the same process without forking.This PR is a redux of https://github.com/collectiveidea/delayed_job/pull/615.
Why?
It makes it much easier to use Delayed Job inside containers (Docker, etc.). Using daemons has several undesirable behaviors when running inside a container:
STDOUT
.The new
--fork
option solves this by running workers in a foreground process tree. When using--fork
thedaemons
gem is not required.Installing
I've added a new
Delayed::Command#launch
method. My intention is to switch to this in the generator for new installs in the next major version, because really forking should be the default and daemons should be a special option, especially in our increasingly container-based world. Here's what it looks like:Limitations
Fork mode does not yet support
restart
,stop
etc. commands; it is always assumed to bestart
. You can gracefully stop it withSIGINT
orSIGTERM
. Commands can be added in a future PR.About the Refactor
I split off the code in
Delayed::Launcher::Daemonized
fromDelayed::Command
. Following the SRP,Delayed::Command
is now only responsible for parsing the CLI args, and theLauncher
class is responsible for actually running the app. From there, I wrote aDelayed::Launcher::Forking
class, but in the end I discovered much of the code can be shared with theDaemonized
class, so I extracted out aDelayed::Launcher::Base
parent class for bothLauncher
types.A nice side effect of this PR is that we can now connect the Rake task to the
Launcher
rather than to theWorker
class, and enable things like multiple workers and pools on Rake. (Rake andscript/delayed
are now unified; this is something which was always mysterious to me as a long time user.)Extras
This PR includes #1139 (fix tests) and #1140 (cleanup README.md).
In addition, there are several quality of life improvements added to the Command script:
--pools
arg as an alias for--pool
, and allow pipe-delimited pool definitions--num-worker
arg as an alias for-n
/--number-of-workers
(and deprecate--number-of-workers
)-v
/--verbose
switch which sets quiet=false on workers-x
switch as alias for--exit-on-complete
STDERR
warningnum-workers
less than 1STDERR
warning ifnum-workers
andpools
both specified (pools
overridesnum-workers
as per existing behavior)STDERR
warning ifqueues
andpools
both specified (pools
overridesqueues
as per existing behavior)STDERR
warning ifmin-priority
is greater thanmax-priority
The Rake script has also been enhanced:
NUM_WORKERS
andPOOLS
argsdelayed/tasks.rb
todelayed/tasks.rake
to make testing easy. (Rakerequire_rake_file
method doesn't play nice with.rb
files)I'd be happy to split some of these into extra PRs but given the amount of spec coverage I've added it's probably fine to merge as is.