AgnostiqHQ / covalent-slurm-plugin

Executor plugin interfacing Covalent with Slurm
https://covalent.xyz
Apache License 2.0
27 stars 6 forks source link

Error when using SlurmExecutor on RHEL8 compute nodes #42

Open svandenhaute opened 1 year ago

svandenhaute commented 1 year ago

Environment

What is happening?

asyncssh seems to have trouble sending the command to create a directory on the compute node. I don't know exactly what's going on, but based on this article I'd conclude that some HPCs do not like a login shell due to a legacy command mesg n in /etc/profile.

How can we reproduce the issue?

import covalent as ct
import numpy as np

@ct.electron(executor='local')
def sum_(n):
    return np.sum(np.arange(n))

@ct.electron(executor='local')
def product_(n):
    return np.prod(np.arange(n)[1:])

def get_sum_product(n):
    return sum_(n) + product_(n)

if __name__ == '__main__':
    workflow = ct.lattice(get_sum_product, executor='slurm')
    dispatch_id = ct.dispatch(workflow)(10)

What should happen?

[2022-11-15 13:22:54,295] [ERROR] execution.py: Line 364 in _run_task: Exception occurred when running task 4: mesg: ttyname failed: Inappropriate ioctl for device
[2022-11-15 13:22:54,297] [ERROR] execution.py: Line 372 in _run_task: Run task exception
Traceback (most recent call last):
  File "/home/sandervandenhaute/envs/covalent_env/pyenv/lib/python3.10/site-packages/covalent_dispatcher/_core/execution.py", line 345, in _run_task
    output, stdout, stderr = await execute_callable()
  File "/home/sandervandenhaute/envs/covalent_env/pyenv/lib/python3.10/site-packages/covalent/executor/base.py", line 572, in execute
    result = await self.run(function, args, kwargs, task_metadata)
  File "/home/sandervandenhaute/envs/covalent_env/pyenv/lib/python3.10/site-packages/covalent_slurm_plugin/slurm.py", line 399, in run
    raise RuntimeError(client_err)
RuntimeError: mesg: ttyname failed: Inappropriate ioctl for device

Any suggestions?

Adding request_pty='force' to the conn.run() call seems to fix the issue, although the message is still displayed in the log. Replacing mesg n with tty -s && mesg n as suggested elsewhere is only possible with root access, which will not always be the case.

wjcunningham7 commented 1 year ago

Hi @svandenhaute thanks so much for this feedback and suggestion. We'll take a look into this and see if we can reproduce the issue.

CC: @AlejandroEsquivel