Open Jaymon opened 6 years ago
It doesn't look like there is anything I can do about this, even in python 3.7 there just isn't a good way to automatically notify the main thread that there was an exception in a child thread, my best hope was:
Python register shutdown function and then modify the thread's new function to add to global threads list, and then in the register shutdown function go through and check to see if they had an error and then raise the error if they did
but that just raises the error and prints it out, it doesn't cause a test to fail or the CLI program to exit with >0 so when running all the tests a person won't get notified the test failed. Even though you can't recover from a KeyboardInterrupt
I think it is still the best approach since it at least gives the user notification the test failed.
I'm running into this same problem again while converting our main codebase to 3.9.7, testdata.Thread
just hangs and the fix seems to be just to switch to threading.Thread
.
It's hard to figure out because it is only a problem when running the entire test suite (meaning if I run the tests individually they pass) and it seems to be the same test over and over, but if I fix that test (meaning I swap testdata.Thread
for threading.Thread
) then that test will pass but another random one will hang. All tests pass if I run that class (eg, pyt testmodule.TestClass
) or that test module (eg, pyt testmodule
), it only hangs when running the entire suite (eg, pyt -df
).
Could this be a problem with gevent? The code I'm converting is heavy with greenthreads and gevent.
This is the test that is hanging:
def test_something(self):
cm, cp = self.get_clients(2)
room = get_room()
def send():
text = testdata.get_words(100)
with self.assertRaises(LookupError):
cp.type_message(room, text)
tp = threading.Thread(target=bot_send)
#tp = testdata.Thread(target=bot_send)
tp.start()
cm.recv_callback(callback=lambda r: "<SENTINAL>" in r.path)
text = testdata.get_words(1)
cm.send_message(room, text)
tp.join()
and if that one passes (meaning I switched testdata.Thread
for threading.Thread
), this one will fail:
def test_connection_loss(self):
cu, cup = self.get_clients(2)
room = get_room()
cu.send_message(room, "foo")
m = cup.recv_message(room)
self.assertEqual("foo", m.text)
testdata.restart_api()
def cup_recv():
m = cup.recv_message(room)
self.assertEqual("bar", m.text)
t = testdata.Thread(target=cup_recv)
t.start()
cu.type_message(room, "bar")
t.join()
try:
# now, let's make sure recv errors out correctly if connection is fully lost
testdata.stop_api()
with self.assertRaises(IOError):
cup.recv_message(room, timeout=1)
I've let it sit for 5+ minutes and it just hangs, and when I ctrl^c
, it will go for a few more tests, then fail with:
======================================================================
ERROR: <SOME OTHER TEST> (moduletest)
----------------------------------------------------------------------
Traceback (most recent call last):
File ".../testmodule.py", line 255, in <SOME OTHER TEST>
File ".../python3.9/site-packages/testdata/threading.py", line 240, in join
super(Thread, self).join(*args, **kwargs)
File ".../python3.9/threading.py", line 1053, in join
self._wait_for_tstate_lock()
File ".../python3.9/threading.py", line 1069, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
File ".../python3.9/site-packages/gevent/thread.py", line 121, in acquire
acquired = BoundedSemaphore.acquire(self, blocking, timeout)
File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 259, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
Hub: <Hub '' at 0x7f6e31f02100 epoll default pending=0 ref=0 fileno=6 resolver=<gevent.resolver.thread.Resolver at 0x7f6e32c4ab50 pool=<ThreadPool at 0x7f6e31edcd60 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f6e31f02100 thread_ident=0x7f6e3834b740>>> threadpool=<ThreadPool at 0x7f6e31edcd60 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f6e31f02100 thread_ident=0x7f6e3834b740>> thread_ident=0x7f6e3834b740>
Handles:
[]
which might point to some gevent error.
It's also interesting that it reports the failure as being in another test.
Switching all the testdata.Thread
calls in the module to threading.Thread
caused the tests to all pass when running the whole suite.
In one testsuite, using
testdata.threading.Thread
when (I think) aKeyboardInterrupt
was triggered.After the test that triggered the
KeyboardInterrupt
, another test usedsubprocess
(bothcheck_output
andcheck_call
) and it would stall (specifically I was usingtestdata.service.InitD.restart
after switching some raw calls to usetestdata.service
classes) and dump theKeyboardInterrupt
.This is the stacktrace:
The fix was to switch back to vanilla
from threading import Thread
instead offrom testdata.threading import Thread
.This only happened when running the whole suite, I could run the tests individually (after restarting postgres and pgbouncer because the failure left them in a bad state) and they would run just fine.