circus-tent / circus

A Process & Socket Manager built with zmq
http://circus.readthedocs.org/
Other
1.55k stars 258 forks source link

Ctrl+C make workers still alive #340

Closed CMGS closed 11 years ago

CMGS commented 11 years ago

when I test circus in my mac, i want terminate the test process, then i press ctrl+c however the arbiter just show "arbiter exit" and the worker still alive, circushttpd alive too. maybe it's a bug?

tarekziade commented 11 years ago

Maybe your program does not treat the shutdown signal correctly. Can you show us your config ?

CMGS commented 11 years ago

test.ini

[circus] check_delay = 5 endpoint = tcp://127.0.0.1:5555 pubsub_endpoint = tcp://127.0.0.1:5556 stats_endpoint = tcp://127.0.0.1:5557 httpd = True httpd_host = 0.0.0.0 httpd_port = 8081

[watcher:webworker] cmd = /Users/CMGS/.virtualenvs/circus/bin/chaussette --fd $(circus.sockets.webapp) flask_app.app use_sockets = True copy_path = True copy_env = True warmup_delay = 0 numprocesses = 1 stdout_stream.class = StdoutStream stderr_stream.class = StdoutStream

[socket:webapp] host = 127.0.0.1 port = 9090

On Wed, Dec 19, 2012 at 5:15 PM, Tarek Ziade notifications@github.comwrote:

Maybe your program does not treat the shutdown signal correctly. Can you show us your config ?

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla-services/circus/issues/340#issuecomment-11522889.

-CMGS A simple coder. Love travel, sports especially outdoor sports and computer technology. Have a dream that one day can tour around.

almet commented 11 years ago

I'm able to reproduce here, thanks.

ionrock commented 11 years ago

I was seeing the same thing on osx starting up mongodb. Here is my config:

[circus]
debug = True

[watcher:mongo]
cmd = /usr/local/bin/mongod --dur --rest
stdout_stream.class = StdoutStream
stderr_stream.class = StdoutStream

Feel free to get in touch if I can help test.

tarekziade commented 11 years ago

If I put a pdb in the arbiter shutdown loop, I can see an error and a sudden exit

Tareks-Mac-Book-Air:examples tarek$ ../bin/circusd ex.ini 
2012-12-26 15:20:00 [55684] [INFO] Starting master on pid 55684
2012-12-26 15:20:00 [55684] [INFO] sockets started
2012-12-26 15:20:00 [55684] [INFO] circusd-stats started
2012-12-26 15:20:00 [55684] [INFO] webworker started
2012-12-26 15:20:00 [55684] [INFO] Arbiter now waiting for commands
^C2012-12-26 15:20:01 [55684] [INFO] Arbiter exiting
> /Users/tarek/Dev/github.com/circus/circus/arbiter.py(383)stop_watchers()
-> watcher.stop()
(Pdb) n
Exception AttributeError: "'NoneType' object has no attribute 'path'" in <function _remove at 0x106780a28> ignored

I am trying to find what does this

tarekziade commented 11 years ago

I have no idea why but the sleep() is the one that provokes the error. Removing it fixes it. And it's also not really needed there.

@CMGS Can you try master to see how it works for you ?

CMGS commented 11 years ago

ToT @tarekziade still failed

because i run it in osx?

tarekziade commented 11 years ago

Crap. osx too here but it's not related.

@CMGS can you add a pdb at circus/circus/arbiter.py line 383 like this:

import pdb; pdb.set_trace()

Just before the stop() call. then when you are there after a ctrl+c, hit 'n' and see what happens

tarekziade commented 11 years ago

@CMGS can you try again with the latest master please?

CMGS commented 11 years ago

@tarekziade sorry for reply later u know chinese timezone lol

i test the latest master in the morning , it still failed, and then i put pdb breakpoint in stop_watchers() at arbiter.py, got this output.

(circus)➜ circus-test circusd t.ini
2012-12-28 09:47:18 [1363] [INFO] Starting master on pid 1363 2012-12-28 09:47:18 [1363] [INFO] sockets started 2012-12-28 09:47:18 [1363] [INFO] circusd-stats started 2012-12-28 09:47:18 [1363] [INFO] webworker started 2012-12-28 09:47:18 [1363] [INFO] Arbiter now waiting for commands 2012-12-28 09:47:19 [1365] [INFO] Application is <flask.app.Flask object at 0x10aa57810> 2012-12-28 09:47:19 [1365] [INFO] Serving on fd://3 2012-12-28 09:47:19 [1365] [INFO] Using <class chaussette.backend._wsgiref.ChaussetteServer at 0x10a4240b8> as a backend ^C2012-12-28 09:47:20 [1363] [INFO] Arbiter exiting

/Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/arbiter.py(394)stop_watchers() -> watcher.stop() (Pdb) s --Call-- /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/util.py(303)_log() -> @wraps(func) (Pdb) func <function stop at 0x104afeb18> (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/util.py(305)_log() -> from circus import logger (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/util.py(307)_log() -> if os.environ.get('DEBUG') is None: (Pdb) logger <logging.Logger object at 0x10473c4d0> (Pdb) os.environ.get('DEBUG') (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/util.py(308)_log() -> return func(self, _args, *_kw) (Pdb) s --Call-- /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(559)stop() -> @util.debuglog (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(563)stop() -> logger.debug('stopping the %s watcher' % self.name) (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(565)stop() -> if self.stdout_redirector is not None: (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(566)stop() -> self.stdout_redirector.kill() (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(568)stop() -> if self.stderr_redirector is not None: (Pdb) n /Users/CMGS/Documents/Workplace/sources/test/circus-test/circus/circus/watcher.py(569)stop() -> self.stderr_redirector.kill() (Pdb) n Exception AttributeError: "'NoneType' object has no attribute 'path'" in <function _remove at 0x10446a938> ignored

then i add breakpoint in watcher, found this problem caused by gevent.hub line 321 return get_hub().switch()

by the way, i use the latest gevent release, version rc1, not the pypi version. it that problem?

tarekziade commented 11 years ago

@CMGS are you sure you are on the 6aba761cd92cea8513118befcc0b27401ae209a2 commit ?

tarekziade commented 11 years ago

So. hold on. I am removing those silly redirectors and adding a single implementation that uses the ioloop

tarekziade commented 11 years ago

Try again please with a fresh master ! :)

CMGS commented 11 years ago

OK,i try this later, u know new year coming, i have to go to my office and test it TAT

On Fri, Dec 28, 2012 at 7:18 PM, Tarek Ziade notifications@github.comwrote:

Try again please with a fresh master ! :)

— Reply to this email directly or view it on GitHubhttps://github.com/mozilla-services/circus/issues/340#issuecomment-11729879.

-CMGS A simple coder. Love travel, sports especially outdoor sports and computer technology. Have a dream that one day can tour around.

tarekziade commented 11 years ago

OK - we'll wait for your feedback but it should work now - thx!

CMGS commented 11 years ago

Sorry for replying laster

in the last failed test, i'm sure i'm on the 6aba761 commit.

and then i pull the lastest commit and test again, it work fine now! all workers exit when got SIG_INT, great work!!

ionrock commented 11 years ago

This has also been working for me in my test configuration that had previously failed. Thanks!

tarekziade commented 11 years ago

thanks all for the tests