LSSTDESC / desc-wfmon

Workflow monitor for DESC image processing
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Cannot run two parsltest jobs on the same node #8

Closed dladams closed 1 year ago

dladams commented 1 year ago

A second parsltest job on a node fails with this error:

RuntimeError: Could not bind to hub_port 55055 because: [Errno 98] Address already in use

Per Ben we should should fix the monitoring port but let parsl choose it.

dladams commented 1 year ago

I stopped specifying the monitoring port and no longer get the above error, but now see the same problem for the worker port:

Traceback (most recent call last):
  File "/home/descprod/conda/lib/python3.10/site-packages/work_queue.py", line 2018, in __init__
    raise Exception('Could not create queue on port {}'.format(port))
Exception: Could not create queue on port 9123

No surprise here. I will try letting parsl choose this from a range, a capability Ben once suggested this existed or could easily be added to parsl.

benclifford commented 1 year ago

tagging myself

dladams commented 1 year ago

DESCprod does not directly start parsl but runs different applications which start parsl and so it would be convenient if the port range could specified via an env variable or configuration file instead of explicitly in the parsl config.

benclifford commented 1 year ago

with a recent desc parsl (after around 3rd january 2023), set the WorkQueueExecutor(port=...) parameter to port=0

Work Queue will then pick an arbitrary port to listen on - on my laptop, it picks starting from 1024.

You can set the range of ports by setting these environment variables:

WORK_QUEUE_HIGH_PORT=33399
WORK_QUEUE_LOW_PORT=33301

in the environment of the submitting process (i.e. the environment which python parsl.load() is executed)

Inside your program, after parsl.load() has initialised everything, you can ask the work queue executor to tell you the port it has chosen.

To do that, you will need to get the WorkQueueExecutor object - either by saving it in a variable when you are constructing the parsl configuration, or alternatively by extracting it from parsl after initialization.

Here is an example of the latter:

from parsl.tests.configs.workqueue_blocks import config
import parsl

parsl.load(config)

executor = parsl.dfk().executors['WorkQueueExecutor'] # use whichever label was used in your config, in place of WorkQueueExecutor

port = executor._port_mailbox.port

print(f"Port is: {port}")

which prints 33301 as the chosen port with the above environment variables set.

dladams commented 1 year ago

Thank you. You probably told me this earlier and I forgot. I appreciate your patience.

dladams commented 1 year ago

I set the port to 0 in parsltest 0.26.11.

dladams commented 1 year ago

With that version, I can now run two parsltest jobs simultaneously with DESCprod on my laptop. I find the port in parsl.log:

dladams@131f705e8e2d:~/rundirs$ grep port  job000136/runinfo/000/parsl.log 
        port=0, 
        client_port_range=(55000, 56000), 
        hub_port=None, 
        hub_port_range=(55050, 56000), 
1674151291.521569 2023-01-19 18:01:31 WorkQueue-Submit-Process-5935 MainThread-274889851392 parsl.executors.workqueue.executor:861 _work_queue_submit_wait DEBUG: Requested port 0
1674151291.556122 2023-01-19 18:01:31 WorkQueue-Submit-Process-5935 MainThread-274889851392 parsl.executors.workqueue.executor:864 _work_queue_submit_wait DEBUG: Listening on port 1025
1674151291.606913 2023-01-19 18:01:31 MainProcess-5418 MainThread-274889851392 parsl.executors.workqueue.executor:363 start DEBUG: Actual listening port is 1025

This is parsl version 1.3.0-dev+desc-2023.01.18a.

dladams commented 1 year ago

Ben:

These variables:

export WORK_QUEUE_LOW_PORT=3123
export WORK_QUEUE_HIGH_PORT=3124

are not changing the port reported in parsl.log:

root@descprod-79c846cdcb-p79c7:/users/dladams/rundirs/job000015# grep port runinfo/000/parsl.log
        port=0, 
        client_port_range=(55000, 56000), 
        hub_port=None, 
        hub_port_range=(55050, 56000), 
1674166863.799676 2023-01-19 22:21:03 WorkQueue-Submit-Process-847 MainThread-140582125556800 parsl.executors.workqueue.executor:861 _work_queue_submit_wait DEBUG: Requested port 0
1674166863.801706 2023-01-19 22:21:03 WorkQueue-Submit-Process-847 MainThread-140582125556800 parsl.executors.workqueue.executor:864 _work_queue_submit_wait DEBUG: Listening on port 1025
1674166863.898888 2023-01-19 22:21:03 MainProcess-470 MainThread-140582125556800 parsl.executors.workqueue.executor:363 start DEBUG: Actual listening port is 1025

I am using parsl version 1.3.0-dev+desc-2023.01.18a. I also don't see those names if I grep the installed files for parsl or in work_queue.py.

dladams commented 1 year ago

Ben: I see the env variables above are not finding their way to the shell that starts parsl. Oops. I will fix that and report back on whether the ports are being set as expected.

dladams commented 1 year ago

OK, when I get he env straightened out, the ports reported in the parsl log are those expected from the range. Sorry for the false alarm.

We can now run multiple jobs and constrain the port range using the env variables above. I close this issue.