Open kstavratis opened 1 year ago
That should not be the case. The sbatch
command runs on the server and the job should run even if you close your ssh session.
I have also experienced the same. Job 3 will crash after running for approx. 16h or 15h59min.
The reason might be: The limit of the running time of job 3 has been set to 16h. It seems that the time needed for training is more than 16 hours. I
@yolkarian If your experiment timed out see this.
I've been trying to run the batch file provided with the commands
sbatch
after overcoming any hurdles with conda environments et cetera.Currently, I'm having an issue with the Euler: it seems to stop executing the job (and thus ends in with a "crashed" status) whenever I turn off my computer. Has this happened to anyone and if yes, how did they resolve it? I submit the job according to the instructions provided in the
README.md
: