Open benclifford opened 1 day ago
To recreate, run the test in #3698 with stderr/streams enabled:
$ !p
pytest parsl/tests/test_htex/test_interchange_exit_bad_registration.py --config local -s
========================================== test session starts ===========================================
platform linux -- Python 3.12.6+, pytest-7.4.4, pluggy-1.4.0
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
rootdir: /home/benc/parsl/src/parsl/parsl/tests
configfile: pytest.ini
plugins: random-order-1.1.1, typeguard-2.13.3, cov-4.1.0, hypothesis-6.103.1
collected 1 item
parsl/tests/test_htex/test_interchange_exit_bad_registration.py /home/benc/parsl/virtualenv-3.12/bin/interchange.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
__import__('pkg_resources').require('parsl==1.3.0.dev0')
Exception in thread Interchange-Task-Puller:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.12/threading.py", line 1012, in run
Exception in thread Interchange-Command:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self._target(*self._args, **self._kwargs)
File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
self.run()
File "/usr/local/lib/python3.12/threading.py", line 1012, in run
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 213, in task_puller
msg = self.task_incoming.recv_pyobj()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
self._target(*self._args, **self._kwargs)
File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
BENC: entering zmq ctx destroy
BENC: leaving zmq ctx destroy
.
============================================ warnings summary ============================================
../../virtualenv-3.12/lib/python3.12/site-packages/dateutil/tz/tz.py:37
/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
EPOCH = datetime.datetime.utcfromtimestamp(0)
parsl/executors/workqueue/executor.py:43
/home/benc/parsl/src/parsl/parsl/executors/workqueue/executor.py:43: DeprecationWarning: 'import work_queue' is deprecated. Please instead use: 'import ndcctools.work_queue'
import work_queue as wq
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================== 1 passed, 2 warnings in 11.54s =====================================
In some situations in my replicator test, the interchange will exit with this jumbled pair of stack traces, but unix exit code 0, not -6:
Exception in thread Interchange-Task-Puller:
Traceback (most recent call last):
File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.12/threading.py", line 1012, in run
Exception in thread Interchange-Command:
Traceback (most recent call last):
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 26, in wrapped
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 213, in task_puller
msg = self.task_incoming.recv_pyobj()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/virtualenv-3.12/lib/python3.12/site-packages/zmq/sugar/socket.py", line 975, in recv_pyobj
self.run()
File "/usr/local/lib/python3.12/threading.py", line 1012, in run
msg = self.recv(flags)
^^^^^^^^^^^^^^^^
File "zmq/backend/cython/socket.pyx", line 805, in zmq.backend.cython.socket.Socket.recv
Describe the bug There are a few paths through which the interchange exits. The regular shutdown path, driven by the DFK, is to send a SIGTERM which immediately kills the process.
Another rare path is using
kill_event
which is polled every 10ms, and is set when a particular form of incorrect worker registration is received.When that kill_event path is taken, the interchange exits with a SIGABRT, placing this (or a variant) on stderr:
The interchange then exits (as desired) but with unix exit code -6, SIGABRT.
This is probably mostly cosmetic: the interchange still exits.
To Reproduce I will make a pull request with a demonstrator test.
Expected behavior clean exit
Environment my laptop, branched from Parsl 2024.11.11