fix resque:start, start correct number of workers

dmarkow commented 10 years ago

Can you provide more explanation? It looks like all you're doing is removing the on roles(role) do block, but that should be required for Capistrano 3 to properly execute on the role designated in the :resque_worker setting.

chenyun commented 10 years ago

An

on roles(role) do

block is existing outside already.

chenyun commented 10 years ago

Please read that function, you will find two same block, this will cause more than desired workers ! And these shadow workers don't have pid files, so they can't be stopped, restarted by cap, which is very dangerous.

dmarkow commented 10 years ago

Which version of Capistrano are you using this on? 3.0 or 3.1? I'm 95% sure that 3.0 had a bug where the on roles block needed to be used again, but I haven't tested 3.1 yet to see if it fixes this (and if it does, we'll probably want to support both situations anyway).

I'll try and find a few minutes today to test this.

dmarkow commented 10 years ago

Ok, now that I review the code, I remember that the threads we use to start multiple workers at once wasn't working properly within the on roles block, so we added another one inside the thread. I just tested this on Capistrano 3.0 and it worked fine (with 2 "foo" workers and 3 "bar" workers)):

dylans-mbp ~/dev/capistrano-resque-test-app(master ✔) cap production resque:start
[deprecated] I18n.enforce_available_locales will default to true in the future. If you really want to skip validation of your locale you can set I18n.enforce_available_locales = false to avoid this message.
 INFO Starting 2 worker(s) with QUEUE: foo
 INFO [7a90f73e] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="foo" PIDFILE=./tmp/pids/resque_work_1.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5  resque:work on foo.example.com
 INFO [abad312e] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="foo" PIDFILE=./tmp/pids/resque_work_2.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5  resque:work on foo.example.com
 INFO [7a90f73e] Finished in 2.490 seconds with exit status 0 (successful).
 INFO [abad312e] Finished in 2.522 seconds with exit status 0 (successful).
 INFO Starting 3 worker(s) with QUEUE: bar
 INFO [f94655d4] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_4.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5  resque:work on foo.example.com
 INFO [3f9944b4] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_5.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5  resque:work on foo.example.com
 INFO [e53dc5b9] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_3.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5  resque:work on foo.example.com
 INFO [f94655d4] Finished in 3.486 seconds with exit status 0 (successful).
 INFO [e53dc5b9] Finished in 3.340 seconds with exit status 0 (successful).
 INFO [3f9944b4] Finished in 3.884 seconds with exit status 0 (successful).

Output on server from ps ux | grep resque:

ec2-2 /data/www/capistrano-resque-test-app/current ps ux | grep resque
dylan    19369  0.1  0.8 149900 61132 ?        Sl   08:46   0:00 resque-1.24.1: Waiting for foo
dylan    19380  0.1  0.7 149596 60804 ?        Sl   08:46   0:00 resque-1.24.1: Waiting for foo
dylan    19787  0.2  0.8 149892 61104 ?        Sl   08:46   0:00 resque-1.24.1: Waiting for bar
dylan    19798  0.2  0.8 149892 61080 ?        Sl   08:46   0:00 resque-1.24.1: Waiting for bar
dylan    19809  0.2  0.8 149896 61076 ?        Sl   08:46   0:00 resque-1.24.1: Waiting for bar

I will test on 3.1 next.

dmarkow commented 10 years ago

Ok, looks like the issue might actually be SSHKit 1.3.0. And with SSHKit 1.3.0, my deploy is completely hanging before it even tries to start a worker (it hangs using your branch too for me).

I may just disable the threading for now -- this will result in it taking a little bit longer to start workers, but at least it should be stable.

If you can provide the relavent capistrano/capistrano-* gem versions from your Gemfile and the relevant parts of your capistrano configuration so I can try and reproduce, that would be helpful.

dmarkow commented 10 years ago

So, it seems the threading code was the issue, not the extra on roles block. I'm disabling threads in Cap 3.x for now to help with stability.

SSHKit 1.3.0 reuses SSH connections now (whereas SSHKit 1.2.0 opened a new one every time). I think that might be having issues with tracking command results within threads (Using threads doesn't seem to be compatible with SSHKit 1.3.0. In some cases (such as @chenyun's) it was creating duplicate workers, and in other cases (mine) it wasn't even getting past the capistrano check to make sure the deployment directory exists).

dmarkow commented 10 years ago

@chenyun Thanks for helping out with this!

chenyun commented 10 years ago

@dmarkow Sorry for this later reply.

chenyun commented 10 years ago

Do you still need the capistrano/capistrano-* gem ?

dmarkow commented 10 years ago

@chenyun Not for the time being. I'm working with the SSHKit team to make it thread-safe. Once we get that updated, if you still have problems, I'll probably need more info. Thanks!

dmarkow commented 10 years ago

This threading issue will be fixed upstream in Capistrano/sshkit#99, once it gets a new gem release I'll add the threading code back in and update the gem dependencies.

9z0b3t1c / capistrano-resque

fix resque:start, start correct number of workers #74