AsydSolutions / ASYD

ASYD is a powerful, versatile, agentless and easy-to-use server deployment automation system, with integrated web interface and monitoring.
https://www.asyd.eu
GNU General Public License v3.0
34 stars 6 forks source link

ASYD freezes and crashes randomly #98

Closed Fusl closed 9 years ago

Fusl commented 9 years ago

I seem to be running into random ASYD feezes/crashes at random times.

The first crash occurred during discarding/closing notifications:

31.220.45.236 - - [15/Jun/2015:14:07:41 +0000] "POST /notification/dismiss HTTP/1.1" 200 - 0.0115
31.220.45.236 - - [15/Jun/2015:14:07:42 +0000] "POST /notification/dismiss HTTP/1.1" 200 - 0.0132
31.220.45.236 - - [15/Jun/2015:14:07:42 +0000] "POST /notification/dismiss HTTP/1.1" 200 - 0.0154
E, [2015-06-15T14:08:44.581703 #15115] ERROR -- : worker=0 PID:15119 timeout (61s > 60s), killing
E, [2015-06-15T14:08:44.595030 #15115] ERROR -- : reaped #<Process::Status: pid 15119 SIGKILL (signal 9)> worker=0
E, [2015-06-15T14:08:44.597598 #15115] ERROR -- : No such process (Errno::ESRCH)
unicorn.conf.rb:23:in `kill'
unicorn.conf.rb:23:in `block in reload'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `call'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `spawn_missing_workers'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:540:in `maintain_worker_count'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:294:in `join'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/bin/unicorn:126:in `<top (required)>'
/usr/local/bin/unicorn:23:in `load'
/usr/local/bin/unicorn:23:in `<main>'

The second crash occurred after adding a new host to ASYD:

E, [2015-06-28T10:34:16.832669 #2458] ERROR -- : worker=0 PID:2462 timeout (61s > 60s), killing
E, [2015-06-28T10:34:16.837715 #2458] ERROR -- : reaped #<Process::Status: pid 2462 SIGKILL (signal 9)> worker=0
E, [2015-06-28T10:34:16.838040 #2458] ERROR -- : No such process (Errno::ESRCH)
unicorn.conf.rb:23:in `kill'
unicorn.conf.rb:23:in `block in reload'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `call'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `spawn_missing_workers'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:540:in `maintain_worker_count'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:294:in `join'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/bin/unicorn:126:in `<top (required)>'
/usr/local/bin/unicorn:23:in `load'
/usr/local/bin/unicorn:23:in `<main>'

Version used is v0.08, I upgraded just now to see if the error goes away or not. Will keep you updated on this issue.

Fusl commented 9 years ago

Next crash during deploying:

E, [2015-06-29T12:31:32.537179 #26995] ERROR -- : worker=0 PID:26999 timeout (61s > 60s), killing
E, [2015-06-29T12:31:32.562995 #26995] ERROR -- : reaped #<Process::Status: pid 26999 SIGKILL (signal 9)> worker=0
E, [2015-06-29T12:31:32.563564 #26995] ERROR -- : No such process (Errno::ESRCH)
unicorn.conf.rb:23:in `kill'
unicorn.conf.rb:23:in `block in reload'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `call'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:523:in `spawn_missing_workers'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:540:in `maintain_worker_count'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/lib/unicorn/http_server.rb:294:in `join'
/var/lib/gems/1.9.1/gems/unicorn-4.9.0/bin/unicorn:126:in `<top (required)>'
/usr/local/bin/unicorn:23:in `load'
/usr/local/bin/unicorn:23:in `<main>'
Choms commented 9 years ago

Can you provide more info? After launching the deploy the UI froze? Did the deploy went fine anyway? What ruby, sqlite and unicorn versions are you using?

This crashes are really strange, not really descriptive and I'm unable to reproduce them, this just looks like the worker process for unicorn hangs for 61 seconds and then restarts.

On those cases you had an uncommonly high number of requests to the ASYD UI? (i.e. several tabs opened on the task list/task detail or alike)

Fusl commented 9 years ago

Not directly after launching the deploy. I can still click around for a few seconds or even minutes and suddenly the webinterface stops responding and after the timeout occurs it closes all connections and ASYD crashes and does not automatically restart.

I saw other people having the same issue but they worked around this by specifying 600, 900, 1800, etc. as timeout.

There was no high number of requests to the UI .I'm blocking any request not coming from my IP address and the last time this happened I was not accessing the UI at all after clicking the deploy button (I just saw that it crashed because ASYD didn't show up in ps fauxww output).

root@fvz-at-asyd01:~# ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]
root@fvz-at-asyd01:~# sqlite3 --version
3.7.13 2012-06-11 02:05:22 f5b5a13f7394dc143aa136f1d4faba6839eaa6dc
root@fvz-at-asyd01:~# unicorn -v
unicorn v4.9.0
Choms commented 9 years ago

Check out commit 9c27c4df5fcd47aae5965c0cf8b9d77af06eec94

I'll push it to master soon, that should solve this as the web worker gets respawned, in any case the fact of the web worker crashing or respawning does not affect the deploying functionality (in the next release the ASYD processes are named so they are easy to identify using ps)

I'm closing this for now