achimnol / aiotools

Idiomatic asyncio utilties
https://aiotools.readthedocs.io
MIT License
153 stars 11 forks source link

Rare deadlock upon shutdown #16

Closed achimnol closed 3 years ago

achimnol commented 4 years ago

Here is a sample gdb stack trace.

Thread 1:

Traceback (most recent call first):
  <built-in method acquire of _thread.lock object at remote 0x7f0e5dc63198>
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/threading.py", line 1056, in join
    self._wait_for_tstate_lock()
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/aiotools/server.py", line 618, in start_server
    child.join()
  File "/home/joongi/backend.ai-dev/agent-1909/src/ai/backend/agent/server.py", line 674, in main
    use_threading=True, args=(cfg, ))
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 1114, in invoke
    return Command.invoke(self, ctx)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/joongi/backend.ai-dev/agent-1909/src/ai/backend/agent/server.py", line 687, in <module>
    sys.exit(main())
  <built-in method exec of module object at remote 0x7f0e6b435638>
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

Thread 2 to 7:

Traceback (most recent call first):
  <built-in method close of Loop object at remote 0x7f0e580015e8>
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/aiotools/server.py", line 307, in _worker_main
    loop.close()
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()

Thread 8:

Traceback (most recent call first):
  Waiting for the GIL

Extra logger subprocess:

Traceback (most recent call first):
  <built-in method read of module object at remote 0x7f0e6b39ce08>
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/home/joongi/backend.ai-dev/common-1909/src/ai/backend/common/logging.py", line 188, in log_worker
    rec = log_queue.get()
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/popen_fork.py", line 73, in _launch
    code = process_obj._bootstrap()
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/context.py", line 277, in _Popen
    return Popen(process_obj)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/home/joongi/backend.ai-dev/common-1909/src/ai/backend/common/logging.py", line 302, in __enter__
    self.proc.start()
  File "/home/joongi/backend.ai-dev/agent-1909/src/ai/backend/agent/server.py", line 660, in main
    with logger:
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 1114, in invoke
    return Command.invoke(self, ctx)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/joongi/.pyenv/versions/venv-m4dm81nc-agent/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/joongi/backend.ai-dev/agent-1909/src/ai/backend/agent/server.py", line 687, in <module>
    sys.exit(main())
  <built-in method exec of module object at remote 0x7f0e6b435638>
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/joongi/.pyenv/versions/3.6.7/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
achimnol commented 3 years ago

This is now mitigated by the rewritten server module based on the new fork module (#23).