Covalent-Slurm plugin version: Custom branch off of develophere so that I could log in. The code in question should not be impacted by this branch.
Python version: 3.9
Operating system: Linux
What is happening?
I tried submitting a SLURM job and got the following traceback.
Traceback (most recent call last):
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent/executor/base.py", line 452, in execute
result = await self.run(function, args, kwargs, task_metadata)
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent_slurm_plugin/slurm.py", line 474, in run
await self._poll_slurm(slurm_job_id, conn)
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent_slurm_plugin/slurm.py", line 333, in _poll_slurm
raise RuntimeError("Job failed with status:\n", status)
RuntimeError: ('Job failed with status:\n', '')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent_dispatcher/_core/runner.py", line 293, in _run_task
output, stdout, stderr, exception_raised = await executor._execute(
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent/executor/base.py", line 421, in _execute
return await self.execute(
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent/executor/base.py", line 459, in execute
await self.teardown(task_metadata=task_metadata)
File "/home/arosen/anaconda3/envs/covalent/lib/python3.9/site-packages/covalent_slurm_plugin/slurm.py", line 505, in teardown
remote_func_filename=self._remote_func_filename,
AttributeError: 'SlurmExecutor' object has no attribute '_remote_func_filename'
My guess (?) is that self._remote_func_filename is not defined since the RuntimeError was raised.
How can we reproduce the issue?
import covalent as ct
import time
executor = ct.executor.SlurmExecutor(<redacted>)
@ct.lattice
@ct.electron(executor=executor)
def add(val1,val2):
time.sleep(10000) # make sure the walltime is less than this
return val1+val2
dispatch_id = ct.dispatch(add)(1,2)
result = ct.get_result(dispatch_id,wait=True)
print(result)
What should happen?
The covalent task should abort gracefully.
Any suggestions?
I think this error happens anytime the job dies unexpectedly (e.g. hits the walltime or otherwise). It doesn't seem to "terminate gracefully."
Addendum
It seems that adding the parsable: "" option fixes the lack of a returned status but otherwise the same issue arises.
Environment
develop
here so that I could log in. The code in question should not be impacted by this branch.What is happening?
I tried submitting a SLURM job and got the following traceback.
My guess (?) is that
self._remote_func_filename
is not defined since theRuntimeError
was raised.How can we reproduce the issue?
What should happen?
The covalent task should abort gracefully.
Any suggestions?
I think this error happens anytime the job dies unexpectedly (e.g. hits the walltime or otherwise). It doesn't seem to "terminate gracefully."
Addendum
It seems that adding the
parsable: ""
option fixes the lack of a returned status but otherwise the same issue arises.