Hello,
I'm trying to use the run_with_submitit.py file to run the model on the Slurm cluster, but I do not get any output log file to see the training progress. All I have here are logs of each node initiating.
Can you please help me with this multinode training?
Best regards,
Mehdi
Hello, I'm trying to use the run_with_submitit.py file to run the model on the Slurm cluster, but I do not get any output log file to see the training progress. All I have here are logs of each node initiating. Can you please help me with this multinode training? Best regards, Mehdi