circus-tent / circus

A Process & Socket Manager built with zmq
http://circus.readthedocs.org/
Other
1.55k stars 257 forks source link

circus doesn't kill chaussette when stopping #986

Open arthurzenika opened 8 years ago

arthurzenika commented 8 years ago

Using circus in conjunction with chaussette lets a chaussette process hanging on stop. Am not sure if this is misconfiguration or a bug or a chaussette bug (or even my app's bug) :

$ circusd --log-level debug --log-output - app.ini
[...]
2016-06-02 15:57:59 circus[30629] [INFO] Got signal SIG_INT
2016-06-02 15:57:59 circus[30629] [INFO] Arbiter exiting
2016-06-02 15:57:59 circus[30629] [DEBUG] stopping the app watcher
2016-06-02 15:57:59 circus[30629] [DEBUG] gracefully stopping processes [app] for 30.0s
2016-06-02 15:57:59 circus[30629] [DEBUG] app: kill process 31652
2016-06-02 15:57:59 circus[30629] [DEBUG] sending signal 15 to 31652
2016-06-02 15:57:59 circus[30629] [DEBUG] stopping the circusd-stats watcher
2016-06-02 15:57:59 circus[30629] [DEBUG] gracefully stopping processes [circusd-stats] for 30.0s
2016-06-02 15:57:59 circus[30629] [DEBUG] circusd-stats: kill process 30636
2016-06-02 15:57:59 circus[30629] [DEBUG] sending signal 15 to 30636
2016-06-02 15:57:59 circus[30629] [DEBUG] stopping the plugin:flapping watcher
2016-06-02 15:57:59 circus[30629] [DEBUG] gracefully stopping processes [plugin:flapping] for 30.0s
2016-06-02 15:57:59 circus[30629] [DEBUG] plugin:flapping: kill process 30634
2016-06-02 15:57:59 circus[30636] [INFO] Stats streamer stopped
2016-06-02 15:57:59 circus[30629] [DEBUG] sending signal 15 to 30634
2016-06-02 15:57:59 circus[30636] [INFO] Stats streamer stopped
2016-06-02 15:57:59 circus[30636] [INFO] Stats streamer stopped
2016-06-02 15:57:59 circus[30629] [DEBUG] reaping already dead process 30636 [circusd-stats]
2016-06-02 15:57:59 circus[30629] [INFO] circusd-stats stopped
2016-06-02 15:57:59 circus[30629] [DEBUG] reaping already dead process 30634 [plugin:flapping]
2016-06-02 15:57:59 circus[30629] [INFO] plugin:flapping stopped
2016-06-02 15:58:03 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:08 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:13 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:18 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:23 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:28 circus[30629] [DEBUG] manage_watchers is conflicting with another command
2016-06-02 15:58:29 circus[30629] [DEBUG] sending signal 9 to 31652
2016-06-02 15:58:29 circus[30629] [DEBUG] reaping already dead process 31652 [app]
2016-06-02 15:58:29 circus[30629] [INFO] app stopped

app.ini has :

[circus]
check_delay = 5
endpoint = tcp://127.0.0.1:5555
pubsub_endpoint = tcp://127.0.0.1:5556
stats_endpoint = tcp://127.0.0.1:5557
;debug = True
;; requires circus-web to be able to start the http dashboard 
;httpd = True

[plugin:flapping]
use = circus.plugins.flapping.Flapping
retry_in = 3
max_retry = 2

[watcher:app]
cmd = /home/arthur/.virtualenvs/app/bin/chaussette --fd $(circus.sockets.app) --backend waitress --log-level debug --log-output - --use-reloader wsgi.app
use_sockets = True
numprocesses = 1
working_dir = /home/arthur/.virtualenvs/app/etc/app.d/app/
uid = arthur
max_age = 3600
max_age_variance = 300

[env:app]
PYTHONPATH=/home/arthur/.virtualenvs/app/etc/app.d/app/
PATH=/usr/local/bin:/usr/bin:/bin

[socket:app]
host = 0.0.0.0
port = 8086

I need to use kill -9, which is a bit brutal :

$ ps xaf | grep chaussette
29762 pts/0    S+     0:00      \_ grep --color=auto chaussette
29721 ?        Sl     0:04 /home/arthur/.virtualenvs/app/bin/python2 /home/arthur/.virtualenvs/app/bin/chaussette --fd 5 --backend waitress --log-level debug --log-output - --use-reloader wsgi.app
$ kill 29721
$ ps xaf | grep chaussette
29770 pts/0    S+     0:00      \_ grep --color=auto chaussette
29721 ?        Sl     0:05 /home/arthur/.virtualenvs/app/bin/python2 /home/arthur/.virtualenvs/app/bin/chaussette --fd 5 --backend waitress --log-level debug --log-output - --use-reloader wsgi.app
$ kill -9 29721
$ ps xaf | grep chaussette
29773 pts/0    S+     0:00      \_ grep --color=auto chaussette

Versions :

chaussette (1.3.0)
circus (0.13.0)
k4nar commented 8 years ago

I think the issue here is that Circus tried to kill the watcher "app" with a SIGTERM, but couldn't (I don't know for which reason). Therefore it sent the a SIGKILL after the graceful timeout (30s), but this did not give any chance for 31652 to reap correctly its children.

I would raise two questions:

I'm very interested in finding the answer to the second question, because I think this is a recurring bug in Circus :) .

arthurzenika commented 8 years ago

@k4nar thanks for answering, I'm available to debug this with some help.

Indeed, running chausette directly I cannot CTRL-C it. (even kill -9 fails), am opening a bug there too : https://github.com/circus-tent/chaussette/issues/78

paulocheque commented 7 years ago

Any news about this? Is there a new version with this bug fixed?

I am facing with this issue too. Circus doesn't stop my web process.

I tried to use: stop_signal = QUIT and stop_signal = KILL