When the interchange fails to start, parsl will hang.
in desc development branch, this particular error comes about because of address="localhost" in my local config, which became invalid as of PR #2828 - see trace below - but it is related to any interchange startup failure.
2023-07-20 08:31:59.529 interchange:128 HTEX-Interchange(53526) MainThread __init__ [DEBUG] Initializing Interchange process
2023-07-20 08:31:59.529 interchange:134 HTEX-Interchange(53526) MainThread __init__ [INFO] Attempting connection to client at 127.0.0.1 on ports: 55785,55909,55898
2023-07-20 08:31:59.530 interchange:147 HTEX-Interchange(53526) MainThread __init__ [INFO] Connected to client
2023-07-20 08:31:59.530 interchange:31 HTEX-Interchange(53526) MainThread wrapped [ERROR] Exceptional ending for starter on thread MainThread
Traceback (most recent call last):
File "/home/benc/parsl/src/parsl/parsl/process_loggers.py", line 27, in wrapped
r = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 638, in starter
ic = Interchange(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/src/parsl/parsl/executors/high_throughput/interchange.py", line 171, in __init__
self.worker_task_port = self.task_outgoing.bind_to_random_port(f"tcp://{self.interchange_address}",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/benc/parsl/virtualenv-3.11/lib/python3.11/site-packages/zmq/sugar/socket.py", line 498, in bind_to_random_port
self.bind(f'{addr}:{port}')
File "/home/benc/parsl/virtualenv-3.11/lib/python3.11/site-packages/zmq/sugar/socket.py", line 302, in bind
super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 564, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: No such device (addr='tcp://localhost:54518')
A clear and concise description of what the bug is.
To Reproduce
parsl master c39700bc2283500e0327ed3d7f00cf0056f2dfed plus this patch, then run pytest.
--- a/parsl/executors/high_throughput/interchange.py
+++ b/parsl/executors/high_throughput/interchange.py
@@ -614,6 +614,7 @@ def starter(comm_q, *args, **kwargs):
The executor is expected to call this function. The args, kwargs match that of the Interchange.__init__
"""
+ raise RuntimeError("Deliberate hang")
setproctitle("parsl: HTEX interchange")
# logger = multiprocessing.get_logger()
Expected behavior
If the interchange doesn't start properly (or if it dies any time later on during execution), this should be a halting terminal error for htex, not a hang.
Describe the bug
When the interchange fails to start, parsl will hang.
in desc development branch, this particular error comes about because of address="localhost" in my local config, which became invalid as of PR #2828 - see trace below - but it is related to any interchange startup failure.
A clear and concise description of what the bug is.
To Reproduce
parsl master c39700bc2283500e0327ed3d7f00cf0056f2dfed plus this patch, then run pytest.
Expected behavior If the interchange doesn't start properly (or if it dies any time later on during execution), this should be a halting terminal error for htex, not a hang.
Environment my laptop