Jaymon / testdata

Python module to make testing easier, it can generate random data like names and text, run commands, fetch urls, create files and directories, and more
MIT License
10 stars 0 forks source link

Thread class causes downstream problems #20

Open Jaymon opened 6 years ago

Jaymon commented 6 years ago

In one testsuite, using testdata.threading.Thread when (I think) a KeyboardInterrupt was triggered.

After the test that triggered the KeyboardInterrupt, another test used subprocess (both check_output and check_call) and it would stall (specifically I was using testdata.service.InitD.restart after switching some raw calls to use testdata.service classes) and dump the KeyboardInterrupt.

This is the stacktrace:

Starting tests.interface.postgres_test.InterfacePGBouncerTest.test_db_disconnect
Stopping tests.interface.postgres_test.InterfacePGBouncerTest.test_db_disconnect after 3.28s
Traceback (most recent call last):
  File "/usr/local/bin/pyt", line 11, in <module>
    sys.exit(console())
  File "/usr/local/lib/python2.7/dist-packages/pyt/__main__.py", line 49, in console
    argv=test_args,
  File "/usr/local/lib/python2.7/dist-packages/pyt/tester.py", line 576, in run_test
    ret = unittest.main(**kwargs)
  File "/usr/lib/python2.7/unittest/main.py", line 95, in __init__
    self.runTests()
  File "/usr/lib/python2.7/unittest/main.py", line 232, in runTests
    self.result = testRunner.run(self.test)
  File "/usr/local/lib/python2.7/dist-packages/pyt/tester.py", line 550, in run
    result = super(TestRunner, self).run(test)
  File "/usr/lib/python2.7/unittest/runner.py", line 151, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/suite.py", line 70, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/suite.py", line 108, in run
    test(result)
  File "/usr/lib/python2.7/unittest/case.py", line 395, in __call__
    return self.run(*args, **kwds)
  File "/usr/lib/python2.7/unittest/case.py", line 331, in run
    testMethod()
  File "/vagrant/tests/interface/postgres_test.py", line 93, in test_db_disconnect
    InitD("postgresql").restart()
  File "/usr/local/lib/python2.7/dist-packages/testdata/service.py", line 34, in restart
    self.start()
  File "/usr/local/lib/python2.7/dist-packages/testdata/service.py", line 30, in start
    self.run(cmd, kwargs)
  File "/usr/local/lib/python2.7/dist-packages/testdata/service.py", line 42, in run
    ret = subprocess.check_output(cmd, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 567, in check_output
    output, unused_err = process.communicate()
  File "/usr/lib/python2.7/subprocess.py", line 791, in communicate
    stdout = _eintr_retry_call(self.stdout.read)
  File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt

The fix was to switch back to vanilla from threading import Thread instead of from testdata.threading import Thread.

This only happened when running the whole suite, I could run the tests individually (after restarting postgres and pgbouncer because the failure left them in a bad state) and they would run just fine.

Jaymon commented 6 years ago

It doesn't look like there is anything I can do about this, even in python 3.7 there just isn't a good way to automatically notify the main thread that there was an exception in a child thread, my best hope was:

Python register shutdown function and then modify the thread's new function to add to global threads list, and then in the register shutdown function go through and check to see if they had an error and then raise the error if they did

but that just raises the error and prints it out, it doesn't cause a test to fail or the CLI program to exit with >0 so when running all the tests a person won't get notified the test failed. Even though you can't recover from a KeyboardInterrupt I think it is still the best approach since it at least gives the user notification the test failed.

Jaymon commented 3 years ago

I'm running into this same problem again while converting our main codebase to 3.9.7, testdata.Thread just hangs and the fix seems to be just to switch to threading.Thread.

It's hard to figure out because it is only a problem when running the entire test suite (meaning if I run the tests individually they pass) and it seems to be the same test over and over, but if I fix that test (meaning I swap testdata.Thread for threading.Thread) then that test will pass but another random one will hang. All tests pass if I run that class (eg, pyt testmodule.TestClass) or that test module (eg, pyt testmodule), it only hangs when running the entire suite (eg, pyt -df).

Could this be a problem with gevent? The code I'm converting is heavy with greenthreads and gevent.

This is the test that is hanging:

    def test_something(self):
        cm, cp = self.get_clients(2)
        room = get_room()

        def send():
            text = testdata.get_words(100)
            with self.assertRaises(LookupError):
                cp.type_message(room, text)

        tp = threading.Thread(target=bot_send)
        #tp = testdata.Thread(target=bot_send)
        tp.start()

        cm.recv_callback(callback=lambda r: "<SENTINAL>" in r.path)

        text = testdata.get_words(1)
        cm.send_message(room, text)
        tp.join()

and if that one passes (meaning I switched testdata.Thread for threading.Thread), this one will fail:

    def test_connection_loss(self):
        cu, cup = self.get_clients(2)
        room = get_room()

        cu.send_message(room, "foo")
        m = cup.recv_message(room)
        self.assertEqual("foo", m.text)

        testdata.restart_api()

        def cup_recv():
            m = cup.recv_message(room)
            self.assertEqual("bar", m.text)

        t = testdata.Thread(target=cup_recv)
        t.start()
        cu.type_message(room, "bar")
        t.join()

        try:
            # now, let's make sure recv errors out correctly if connection is fully lost
            testdata.stop_api()

            with self.assertRaises(IOError):
                cup.recv_message(room, timeout=1)

I've let it sit for 5+ minutes and it just hangs, and when I ctrl^c, it will go for a few more tests, then fail with:

======================================================================
ERROR: <SOME OTHER TEST> (moduletest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File ".../testmodule.py", line 255, in <SOME OTHER TEST>
  File ".../python3.9/site-packages/testdata/threading.py", line 240, in join
    super(Thread, self).join(*args, **kwargs)
  File ".../python3.9/threading.py", line 1053, in join
    self._wait_for_tstate_lock()
  File ".../python3.9/threading.py", line 1069, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File ".../python3.9/site-packages/gevent/thread.py", line 121, in acquire
    acquired = BoundedSemaphore.acquire(self, blocking, timeout)
  File "src/gevent/_semaphore.py", line 180, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_semaphore.py", line 259, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_semaphore.py", line 249, in gevent._gevent_c_semaphore.Semaphore.acquire
  File "src/gevent/_abstract_linkable.py", line 521, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait
  File "src/gevent/_abstract_linkable.py", line 487, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 490, in gevent._gevent_c_abstract_linkable.AbstractLinkable._wait_core
  File "src/gevent/_abstract_linkable.py", line 442, in gevent._gevent_c_abstract_linkable.AbstractLinkable._AbstractLinkable__wait_to_be_notified
  File "src/gevent/_abstract_linkable.py", line 451, in gevent._gevent_c_abstract_linkable.AbstractLinkable._switch_to_hub
  File "src/gevent/_greenlet_primitives.py", line 61, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_greenlet_primitives.py", line 65, in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
  File "src/gevent/_gevent_c_greenlet_primitives.pxd", line 35, in gevent._gevent_c_greenlet_primitives._greenlet_switch
gevent.exceptions.LoopExit: This operation would block forever
    Hub: <Hub '' at 0x7f6e31f02100 epoll default pending=0 ref=0 fileno=6 resolver=<gevent.resolver.thread.Resolver at 0x7f6e32c4ab50 pool=<ThreadPool at 0x7f6e31edcd60 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f6e31f02100 thread_ident=0x7f6e3834b740>>> threadpool=<ThreadPool at 0x7f6e31edcd60 tasks=0 size=1 maxsize=10 hub=<Hub at 0x7f6e31f02100 thread_ident=0x7f6e3834b740>> thread_ident=0x7f6e3834b740>
    Handles:
[]

which might point to some gevent error.

It's also interesting that it reports the failure as being in another test.

Switching all the testdata.Thread calls in the module to threading.Thread caused the tests to all pass when running the whole suite.