Closed xiangtaowong closed 2 years ago
when i run python sunmitt_pretrain.py, i get this error:
sbatch: unrecognized option '--gpus-per-node=8' sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure) File "/home/wangxiangtao/mae-main/submitit_pretrain.py", line 126, in main job = executor.submit(trainer) File "/home/wangxiangtao/mae-main/submitit_pretrain.py", line 133, in <module> main()
and the .sh is as follows:
#!/bin/bash # Parameters #SBATCH --constraint=volta32gb #SBATCH --cpus-per-task=10 #SBATCH --error=/home/wangxiangtao/mae-main/output_dir/%j_0_log.err #SBATCH --gpus-per-node=8 #SBATCH --job-name=mae #SBATCH --mem=320GB #SBATCH --nodes=8 #SBATCH --ntasks-per-node=8 #SBATCH --open-mode=append #SBATCH --output=/home/wangxiangtao/mae-main/output_dir/%j_0_log.out #SBATCH --partition=learnfair #SBATCH --signal=USR2@120 #SBATCH --time=4320 #SBATCH --wckey=submitit
Where should I change? Looking forward to ur reply!
The issue could be avoided by using main_pratrain.py directly, which means not using submitit or sbatch.
when i run python sunmitt_pretrain.py, i get this error:
and the .sh is as follows:
Where should I change? Looking forward to ur reply!