galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Pulsar cannot access remote login node host #374

Open peterg1t opened 2 months ago

peterg1t commented 2 months ago

From the documentation it states that "Pulsar can also login into a remote host before executing these commands if the job manager is not accessible from the Pulsar host." However when submitting a job I got the following error:

Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 DEBUG [pulsar.managers.base][[manager=_default_]-[action=preprocess]-[job=80]] job_id: 80 - checking tool file changeCase.pl
Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 WARNI [pulsar.managers.util.cli.shell.rsh][[manager=_default_]-[action=preprocess]-[job=80]] slurm-login
Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 WARNI [pulsar.managers.util.cli.shell.rsh][[manager=_default_]-[action=preprocess]-[job=80]] root
Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 WARNI [pulsar.managers.util.cli.shell.rsh][[manager=_default_]-[action=preprocess]-[job=80]] ['-o', 'ConnectTimeout=60']
Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 DEBUG [galaxy.tool_util.deps][[manager=_default_]-[action=preprocess]-[job=80]] Using dependency perl version 5.26 of type conda
Sep 17 13:51:52 pulsar pulsar[176577]: 2024-09-17 13:51:52,627 INFO  [pulsar.managers.util.cli.job.slurm][[manager=_default_]-[action=preprocess]-[job=80]] directory_mode
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,834 INFO  [pulsar.managers.queued_cli][[manager=_default_]-[action=preprocess]-[job=80]] HELLO
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,835 INFO  [pulsar.managers.queued_cli][[manager=_default_]-[action=preprocess]-[job=80]] {'stdout': '', 'stderr': 'sbatch: error: Unable to open file /home/pulsar/files/staging/80/command.sh\n', 'returncode': 0}
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,835 INFO  [pulsar.managers.queued_cli][[manager=_default_]-[action=preprocess]-[job=80]]
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,835 WARNI [pulsar.managers.queued_cli][[manager=_default_]-[action=preprocess]-[job=80]] Failed to obtain external id for job_id 80 and submission_command sbatch /home/pulsar/files/staging/80/command.sh
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,835 DEBUG [pulsar.messaging.bind_amqp][[manager=_default_]-[action=preprocess]-[job=80]] Publishing Pulsar state change with status failed for job_id 80
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,836 DEBUG [pulsar.client.amqp_exchange][[manager=_default_]-[action=preprocess]-[job=80]] [publish:4af62276-752e-11ef-96b1-020101230009] Begin publishing to key pulsar__status_update
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,837 DEBUG [pulsar.client.amqp_exchange][[manager=_default_]-[action=preprocess]-[job=80]] [publish:4af62276-752e-11ef-96b1-020101230009] Have producer for publishing to key pulsar__status_update
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,837 WARNI [pulsar.client.amqp_exchange][[manager=_default_]-[action=preprocess]-[job=80]] kombu version 5.0.2 does not support timeout argument to publish. Consider updating to 5.2.0 or newer
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,859 DEBUG [pulsar.client.amqp_exchange][[manager=_default_]-[action=preprocess]-[job=80]] [publish:4af62276-752e-11ef-96b1-020101230009] Published to key pulsar__status_update
Sep 17 13:51:55 pulsar pulsar[176577]: 2024-09-17 13:51:55,859 ERROR [pulsar.managers.stateful][[manager=_default_]-[action=preprocess]-[job=80]] Failed job preprocessing for job 80:
Sep 17 13:51:55 pulsar pulsar[176577]: Traceback (most recent call last):
Sep 17 13:51:55 pulsar pulsar[176577]:   File "/home/pulsar/venv/lib/python3.6/site-packages/pulsar/managers/stateful.py", line 135, in _handling_of_preprocessing_state
Sep 17 13:51:55 pulsar pulsar[176577]:     **launch_kwds
Sep 17 13:51:55 pulsar pulsar[176577]:   File "/home/pulsar/venv/lib/python3.6/site-packages/pulsar/managers/queued_cli.py", line 60, in launch
Sep 17 13:51:55 pulsar pulsar[176577]:     raise Exception("Failed to obtain external id")
Sep 17 13:51:55 pulsar pulsar[176577]: Exception: Failed to obtain external id

I can confirm that I have ssh keys and I can connect to the Slurm login node from the pulsar server. Thank you very much in advance for your help!

Pedro