Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
506 stars 195 forks source link

Interchange SIGABRT on kill_event driven exit (only cosmetic?) #3697

Open benclifford opened 1 day ago

benclifford commented 1 day ago

Describe the bug There are a few paths through which the interchange exits. The regular shutdown path, driven by the DFK, is to send a SIGTERM which immediately kills the process.

Another rare path is using kill_event which is polled every 10ms, and is set when a particular form of incorrect worker registration is received.

When that kill_event path is taken, the interchange exits with a SIGABRT, placing this (or a variant) on stderr:

Exception in thread Interchange-Task-Puller:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
Exception in thread Interchange-Command:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self._target(*self._args, **self._kwargs)
  File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 213, in task_puller
    self._target(*self._args, **self._kwargs)
    msg = self.task_incoming.recv_pyobj()
  File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    r = func(*args, **kwargs)
  File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 251, in _command_server
    command_req = self.command_channel.recv_pyobj()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    msg = self.recv(flags)
  File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
          ^^^^^^^^^^^^^^^^
  File "zmq/backend/cython/socket.pyx", line 805, in zmq.backend.cython.socket.Socket.recv
    msg = self.recv(flags)
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x0000560f81d13238)

Current thread 0x00007fcb0a6eb740 (most recent call first):
  <no Python frame>

Extension modules: zmq.backend.cython.context, zmq.backend.cython.message, zmq.backend.cython.socket, zmq.backend.cython._device, zmq.backend.cython._poll, zmq.backend.cython._proxy_steerable, zmq.backend.cython._version, zmq.backend.cython.error, zmq.backend.cython.utils, setproctitle._setproctitle, sqlalchemy.cimmutabledict, greenlet._greenlet, sqlalchemy.cprocessors, sqlalchemy.cresultproxy, psutil._psutil_linux, psutil._psutil_posix, charset_normalizer.md, _cffi_backend, yaml._yaml, ndcctools._cwork_queue, ndcctools._cresource_monitor (total: 21)

The interchange then exits (as desired) but with unix exit code -6, SIGABRT.

This is probably mostly cosmetic: the interchange still exits.

To Reproduce I will make a pull request with a demonstrator test.

Expected behavior clean exit

Environment my laptop, branched from Parsl 2024.11.11

benclifford commented 1 day ago

To recreate, run the test in #3698 with stderr/streams enabled:

$ !p
pytest  parsl/tests/test_htex/test_interchange_exit_bad_registration.py --config local -s
========================================== test session starts ===========================================
platform linux -- Python 3.12.6+, pytest-7.4.4, pluggy-1.4.0
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
rootdir: /home/benc/parsl/src/parsl/parsl/tests
configfile: pytest.ini
plugins: random-order-1.1.1, typeguard-2.13.3, cov-4.1.0, hypothesis-6.103.1
collected 1 item                                                                                         

parsl/tests/test_htex/test_interchange_exit_bad_registration.py /home/benc/parsl/virtualenv-3.12/bin/interchange.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('parsl==1.3.0.dev0')
Exception in thread Interchange-Task-Puller:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
Exception in thread Interchange-Command:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self._target(*self._args, **self._kwargs)
  File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 213, in task_puller
    msg = self.task_incoming.recv_pyobj()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
    self._target(*self._args, **self._kwargs)
  File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
.

============================================ warnings summary ============================================
../../virtualenv-3.12/lib/python3.12/site-packages/dateutil/tz/tz.py:37
  /home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
    EPOCH = datetime.datetime.utcfromtimestamp(0)

parsl/executors/workqueue/executor.py:43
  /home/benc/parsl/src/parsl/parsl/executors/workqueue/executor.py:43: DeprecationWarning: 'import work_queue' is deprecated. Please instead use: 'import ndcctools.work_queue'
    import work_queue as wq

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================== 1 passed, 2 warnings in 11.54s =====================================
benclifford commented 1 day ago

In some situations in my replicator test, the interchange will exit with this jumbled pair of stack traces, but unix exit code 0, not -6:

Exception in thread Interchange-Task-Puller:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
Exception in thread Interchange-Command:
Traceback (most recent call last):
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
  File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 213, in task_puller
    msg = self.task_incoming.recv_pyobj()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
    self.run()
  File "/usr/local/lib/python3.12/threading.py", line 1012, in run
    msg = self.recv(flags)
          ^^^^^^^^^^^^^^^^
  File "zmq/backend/cython/socket.pyx", line 805, in zmq.backend.cython.socket.Socket.recv