craigahobbs / unittest-parallel

Parallel unit test runner for Python with coverage support
MIT License
29 stars 5 forks source link

Occasional exception in multiprocessing when running in Python 3.7 #15

Closed erezsh closed 1 year ago

erezsh commented 2 years ago

I noticed our test-suite sometimes fails with a multiprocessing exception, always the same one (TypeError: an integer is required (got type NoneType)). The chances seems to be around 50-50, and if I run the tests again it usually works. It only happens under Python 3.7.

Tests are run with poetry run unittest-parallel -j 16

Here is an example run: https://github.com/datafold/data-diff/runs/7197722519?check_suite_focus=true

And here is the stack-trace from that run:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/runner/.cache/pypoetry/virtualenvs/data-diff-DY5pfXRE-py3.7/lib/python3.7/site-packages/unittest_parallel/main.py", line 269, in run_tests
    if self.failfast.is_set():
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/managers.py", line 1088, in is_set
    return self._callmethod('is_set')
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/managers.py", line 818, in _callmethod
    conn.send((self._id, methodname, args, kwds))
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/home/runner/.cache/pypoetry/virtualenvs/data-diff-DY5pfXRE-py3.7/bin/unittest-parallel", line 8, in <module>
    sys.exit(main())
  File "/home/runner/.cache/pypoetry/virtualenvs/data-diff-DY5pfXRE-py3.7/lib/python3.7/site-packages/unittest_parallel/main.py", line 116, in main
    results = pool.map(test_manager.run_tests, test_suites)
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/pool.py", line 268, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/hostedtoolcache/Python/3.7.13/x64/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
TypeError: an integer is required (got type NoneType)
craigahobbs commented 2 years ago

It's pretty telling that it only happens in Python 3.7. Could this be BPO-17560 fixed in Python 3.8?

https://github.com/python/cpython/issues/61760

Specifically this change?

https://github.com/python/cpython/blame/3.8/Lib/multiprocessing/connection.py#L392

erezsh commented 2 years ago

Hmm doesn't seem like it's the same error. But maybe I'm overlooking something.

craigahobbs commented 2 years ago

The change above affects the _send_bytes call of Python 3.7.13 at connection.py line 206 of your callstack.

craigahobbs commented 2 years ago

For the purpose of testing my theory, would it be possible for you to hack-up your Python 3.7's multiprocessing/connection.py with this change? In other words, backup connection.py, apply the patch, and see if you can still reproduce?

https://github.com/python/cpython/commit/bccacd19fa7b56dcf2fbfab15992b6b94ab6666b

craigahobbs commented 1 year ago

Closing since Python 3.7 is near end-of-life (2023-06-27).