Closed chenyun closed 10 years ago
An
on roles(role) do
block is existing outside already.
Please read that function, you will find two same block, this will cause more than desired workers ! And these shadow workers don't have pid files, so they can't be stopped, restarted by cap, which is very dangerous.
Which version of Capistrano are you using this on? 3.0 or 3.1? I'm 95% sure that 3.0 had a bug where the on roles
block needed to be used again, but I haven't tested 3.1 yet to see if it fixes this (and if it does, we'll probably want to support both situations anyway).
I'll try and find a few minutes today to test this.
Ok, now that I review the code, I remember that the threads we use to start multiple workers at once wasn't working properly within the on roles
block, so we added another one inside the thread. I just tested this on Capistrano 3.0 and it worked fine (with 2 "foo" workers and 3 "bar" workers)):
dylans-mbp ~/dev/capistrano-resque-test-app(master ✔) cap production resque:start
[deprecated] I18n.enforce_available_locales will default to true in the future. If you really want to skip validation of your locale you can set I18n.enforce_available_locales = false to avoid this message.
INFO Starting 2 worker(s) with QUEUE: foo
INFO [7a90f73e] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="foo" PIDFILE=./tmp/pids/resque_work_1.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5 resque:work on foo.example.com
INFO [abad312e] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="foo" PIDFILE=./tmp/pids/resque_work_2.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5 resque:work on foo.example.com
INFO [7a90f73e] Finished in 2.490 seconds with exit status 0 (successful).
INFO [abad312e] Finished in 2.522 seconds with exit status 0 (successful).
INFO Starting 3 worker(s) with QUEUE: bar
INFO [f94655d4] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_4.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5 resque:work on foo.example.com
INFO [3f9944b4] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_5.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5 resque:work on foo.example.com
INFO [e53dc5b9] Running RBENV_ROOT=~/.rbenv RBENV_VERSION=2.1.0 ~/.rbenv/bin/rbenv exec bundle exec rake RAILS_ENV=production QUEUE="bar" PIDFILE=./tmp/pids/resque_work_3.pid BACKGROUND=yes VERBOSE=1 INTERVAL=5 resque:work on foo.example.com
INFO [f94655d4] Finished in 3.486 seconds with exit status 0 (successful).
INFO [e53dc5b9] Finished in 3.340 seconds with exit status 0 (successful).
INFO [3f9944b4] Finished in 3.884 seconds with exit status 0 (successful).
Output on server from ps ux | grep resque
:
ec2-2 /data/www/capistrano-resque-test-app/current ps ux | grep resque
dylan 19369 0.1 0.8 149900 61132 ? Sl 08:46 0:00 resque-1.24.1: Waiting for foo
dylan 19380 0.1 0.7 149596 60804 ? Sl 08:46 0:00 resque-1.24.1: Waiting for foo
dylan 19787 0.2 0.8 149892 61104 ? Sl 08:46 0:00 resque-1.24.1: Waiting for bar
dylan 19798 0.2 0.8 149892 61080 ? Sl 08:46 0:00 resque-1.24.1: Waiting for bar
dylan 19809 0.2 0.8 149896 61076 ? Sl 08:46 0:00 resque-1.24.1: Waiting for bar
I will test on 3.1 next.
Ok, looks like the issue might actually be SSHKit 1.3.0. And with SSHKit 1.3.0, my deploy is completely hanging before it even tries to start a worker (it hangs using your branch too for me).
I may just disable the threading for now -- this will result in it taking a little bit longer to start workers, but at least it should be stable.
If you can provide the relavent capistrano/capistrano-* gem versions from your Gemfile and the relevant parts of your capistrano configuration so I can try and reproduce, that would be helpful.
So, it seems the threading code was the issue, not the extra on roles
block. I'm disabling threads in Cap 3.x for now to help with stability.
SSHKit 1.3.0 reuses SSH connections now (whereas SSHKit 1.2.0 opened a new one every time). I think that might be having issues with tracking command results within threads (Using threads doesn't seem to be compatible with SSHKit 1.3.0. In some cases (such as @chenyun's) it was creating duplicate workers, and in other cases (mine) it wasn't even getting past the capistrano check to make sure the deployment directory exists).
@chenyun Thanks for helping out with this!
@dmarkow Sorry for this later reply.
Do you still need the capistrano/capistrano-* gem ?
@chenyun Not for the time being. I'm working with the SSHKit team to make it thread-safe. Once we get that updated, if you still have problems, I'll probably need more info. Thanks!
This threading issue will be fixed upstream in Capistrano/sshkit#99, once it gets a new gem release I'll add the threading code back in and update the gem dependencies.
Can you provide more explanation? It looks like all you're doing is removing the
on roles(role) do
block, but that should be required for Capistrano 3 to properly execute on the role designated in the:resque_worker
setting.