Closed quigybo closed 10 years ago
Hm, this is strange. I'm not using subprocess
directly. Instead I'm using asyncproc.py
.
Asyncproc seems to deal with killing itself:
def __del__(self, __killer=os.kill, __sigkill=signal.SIGKILL):
if self.__exitstatus is None:
__killer(self.pid(), __sigkill)
I wonder why it doesn't always work... I've also experienced some issues with zombie processes.
Maybe we could try to wrap client.cmdloop()
in a try-except block and then check for subprocesses to kill them...
Sometimes it pays to actualy read the output, this exception is raised inside MPlayer.del:
Exception TypeError: TypeError('signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object',) in <bound method MPlayer.__del__ of <orochi.player.MPlayer object at 0xb7060a4c>> ignored
If there was no alarm set, signal.signal
returns None
. cbcf173 first checks
if there is something that needs to be restored before attempting to do so.
This is probably something that should be reported upstream. Once this is merged into master I can send an email to the author (Thomas Bellman).
Ah, nice catch! Do you think you could write a small regression test (add test_asyncproc.py
in https://github.com/dbrgn/orochi/tree/master/tests) before the merge? That'll probably help Thomas too.
Ok I think I have finally figured this issue out.
According to the documentation, None
will only be returned if the previous
signal handler was not installed from Python, which means something is setting
the alarm on the C side. The following code will induce the error (on the master
branch):
import sys
from orochi.asyncproc import with_timeout
class DelFunc(object):
def __init__(self, func, *args):
self.func = func
self.args = args
def __del__(self):
self.func(*self.args)
df = DelFunc(with_timeout, 1, lambda:1)
sys.exit()
It seems that Python sets SIGALRM
when the destructors are being called during
system exit, which makes it quite difficult to test for this issue as no exception is
actually raised, only a warning is printed to sys.stderr
. Possible ways around
this could be using the xdist plugin to run the tests in a separate Python
process, or setting the signal handler directly in C/ctypes. What do you think,
is this something worth pursuing further?
Just to let you know, I amended the commit to give more detail on this issue and
changed the test to explicitly check for None
because the default value,
SIG_DFL
, is actually the int 0
.
My first thought was that we should just ignore the test, but I think this issue is hard to catch and should therefore be covered by a regression test, so it won't happen again in the future :)
Possible ways around this could be using the xdist plugin to run the tests in a separate Python process, or setting the signal handler directly in C/ctypes. What do you think, is this something worth pursuing further?
If you think it'll work, using ctypes is probably better than to add a new dependency. What do you think? I've never used ctypes, so I can't judge it.
I agree that we should try and figure out a way of inducing the error and testing for it, however for some reason the below code is not behaving as I would expect:
import ctypes
import signal
libc = ctypes.cdll.LoadLibrary('libc.so.6')
SIGFUNC = ctypes.CFUNCTYPE(None, ctypes.c_int)
def f(sig):
print "SIGALRM caught"
libc.signal(signal.SIGALRM, SIGFUNC(f))
print(signal.getsignal(signal.SIGALRM))
I would expect that getsignal
returns None
as we have set the signal in C,
but instead it returns SIG_DFL
as if no signal was set. You can use
libc.alarm(1); libc.sleep(2)
to check that the signal is set properly.
And just in case something unexpected is going on in ctypes, here is a C extension that effectively does the same thing.
#include <Python.h>
#include <signal.h>
static void
alarm_handler(int sig) {
printf("SIGALRM caught\n");
}
static PyObject *
sigtest_signal(PyObject *self) {
signal(SIGALRM, alarm_handler);
Py_RETURN_NONE;
}
static PyObject *
sigtest_alarm(PyObject *self, PyObject *args) {
int secs;
if (!PyArg_ParseTuple(args, "i", &secs))
return NULL;
alarm(secs);
Py_RETURN_NONE;
}
static PyMethodDef Sigtest_Methods[] = {
{"signal", sigtest_signal, METH_NOARGS, "Sets signal SIGALRM."},
{"alarm", sigtest_alarm, METH_VARARGS, "Sets alarm."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC
initsigtest(void) {
(void) Py_InitModule("sigtest", Sigtest_Methods);
}
After building and importing it call sigtest.signal()
to setup the handler,
and signal.getsignal(signal.SIGALRM)
to see which handler is installed. As
expected this module behaves the same as the ctypes example where
signal.getsignal(signal.SIGALRM)
doesn't return None
.
According to this bug report, it has something to do with signal handlers being run in a subprocess. I will check it out when I have time.
OK, if we ran into a bug here we can't do much about it.
When I find time, I'll check out Twisted. Maybe an event based model would work for Orochi (see #21), which would also solve this issue and make the code overall easier to write/understand.
In the meantime, should I just merge this? Or do you want to do more research about the signal handling?
It isn't actually a bug in python, it is expected and documented behaviour; the bug report was for the test suite.
I will briefly look into signal handlers in a subprocess first and get back to you, so don't merge just yet.
With regards to an event-based model, I'm not sure that it will solve this issue as it will still arise whenever signals are used while Python is shutting down, like in a destructor. Anyway the difficulty isn't in fixing this issue, it lies in inducing it in a test suite.
With regards to an event-based model, I'm not sure that it will solve this issue as it will still arise whenever signals are used while Python is shutting down, like in a destructor.
Yes, it's not "fixing" this issue, but I think it could make code easier, without a lot of signals and threads and asynchronous calls... But that's more a paradigm question.
Ok maybe you should just merge this for now, I had another go at it without much success. If I come up with something in the future I will let you know.
Alright! Thanks for the work!
@quigybo, your branch is out of sync with current master. Could you please rebase your branch and create a pull request against current master? (If I'd cherry-pick your commit I'd lose the author information.)
@quigybo Hm, the test works on my system, but fails on Travis: https://travis-ci.org/dbrgn/orochi/builds/12075290 Could that be an issue with the coverage script? Or is it the exception we're testing for?
This is the output when applying the test to asyncproc.py
pre 9aa69c3 on my
machine:
============================= test session starts ==============================
platform linux2 -- Python 2.7.5 -- pytest-2.3.4
plugins: pep8, cache, cov
collected 16 items
orochi/__init__.py s
orochi/api.py s
orochi/client.py s
orochi/colors.py s
orochi/meta.py s
orochi/player.py s
tests/__init__.py s
tests/test_asyncproc.py .F
tests/test_client.py s...
tests/test_colors.py s.
tests/test_testing.py s
=================================== FAILURES ===================================
________________________________ test_asyncproc ________________________________
tests/test_asyncproc.py:30: in test_asyncproc
> assert "Exception" not in output
E assert u'Exception' not in 'Error processing lin...b72d3b8c>> ignored\n'
E u'Exception' is contained here:
E e ignored
E Exception TypeError: 'signal handler must be signal.SIG_IGN, signal.SIG_DFL, or a callable object' in <bound method DelFunc.__del__ of <__main__.DelFunc object at 0xb72d3b8c>> ignored
E ? +++++++++
--------------- coverage: platform linux2, python 2.7.5-final-0 ----------------
Name Stmts Miss Cover
-------------------------------------
orochi/__init__ 0 0 100%
orochi/api 63 46 27%
orochi/client 281 204 27%
orochi/colors 6 0 100%
orochi/meta 6 6 0%
orochi/player 107 82 23%
-------------------------------------
TOTAL 463 338 27%
================ 1 failed, 5 passed, 10 skipped in 1.34 seconds ================
And for reference, here is the relevant output from Travis:
tests/test_asyncproc.py:30: in test_asyncproc
> assert "Exception" not in output
E assert u'Exception' not in 'Coverage.py warning:...x24d2dd0>> ignored\n'
E u'Exception' is contained here:
E ollected.
E Exception SystemError: 'error return without exception set' in <bound method DelFunc.__del__ of <__main__.DelFunc object at 0x24d2dd0>> ignored
E ? +++++++++
To be honest I'm really not sure what that error is, the test works fine here both with and without virtualenv.
The output suggests something is going on with the coverage tests, but I'm not
sure why the python subprocess is affected by that. In particular I pass the
-E
flag to the subprocess, which, according to the manpage, should "ignore
environment variables like PYTHONPATH and PYTHONHOME that modify the behaviour
of the interpreter". Unless it really is just PYTHONPATH and PYTHONHOME.
If I explicitly set env={}
in check_output
, or remove the coverage tests
from pytest.ini
, the warning goes away but not the exception:
tests/test_asyncproc.py:30: in test_asyncproc
> assert "Exception" not in output
E assert u'Exception' not in 'Exception SystemErro...a246e810>> ignored\n'
E u'Exception' is contained here:
E Exception SystemError: 'error return without exception set' in <bound method DelFunc.__del__ of <__main__.DelFunc object at 0x7f1ea246e810>> ignored
E ? +++++++++
The remaining error seems to suggest that some part of the test suite is still
in effect (in particular the SystemError: 'error return without exception set'
part), but I don't know why. Any ideas?
By checking explicitly for TypeError
, the failing test is fixed.
Haha it's funny how sometimes you don't see the simplest and most obvious solution.
Not sure if it is worth the effort, but I renamed the test to more accurately
reflect what we are testing (with_timeout
) in d3595d2.
+1, I added it here: 865a788c07d409b210541d7af8eab1e7e6cc712c
After experiencing a few crashes, I ran
pgrep mplayer
and noticed severalmplayer
processes left running. These should be caught and killed before orochi exits, even after a crash.