Open MartinGrignard opened 1 month ago
Hi, sorry for the late reply, I just got back from a vacation. This error usually happens when hpc-rocket fails to launch the slurm job entirely. Can you show me the content of the log file of the slurm job?
Sorry for the delay, I also was away for the last 2 weeks.
I don't have a log from the job.
Using sacct
shows that it is not even submitted.
Can it be due to the fact that slurm is actually a module on our cluster, meaning it may not be loaded at the start of the session depending on the type of session HPCrocket uses?
EDIT:
I just tried to change the command for something else, and it looks like none of the commands I tried proceed without raising an error. Hence, it means that, for some reason, the call to cmd.wait_until_exit()
always returns a non-zero exit code.
Since HPC rocket manages to copy the files to the remote, it looks like it does not come from a connection issue...
Hi :wave:
I'm currently trying to run HPC-rocket to submit a job from my local machine (before integrating it in a GitLab CI/CD pipeline). I created a simple
slurm.job
that only prints the hostname to check if the job runs properly.Here is my configuration:
When I run the following command:
I get the following output:
Since there is no additional logs, and the job runs when I submit it manually on the cluster, do you have any idea of what could be the problem here?
Thanks!