facebookresearch / mae

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
Other
7.2k stars 1.2k forks source link

sbatch: unrecognized option '--gpus-per-node=8' #119

Closed xiangtaowong closed 2 years ago

xiangtaowong commented 2 years ago

when i run python sunmitt_pretrain.py, i get this error:

sbatch: unrecognized option '--gpus-per-node=8'
sbatch: error: Batch job submission failed: Unable to contact slurm controller (connect failure)
  File "/home/wangxiangtao/mae-main/submitit_pretrain.py", line 126, in main
    job = executor.submit(trainer)
  File "/home/wangxiangtao/mae-main/submitit_pretrain.py", line 133, in <module>
    main()

and the .sh is as follows:

#!/bin/bash

# Parameters
#SBATCH --constraint=volta32gb
#SBATCH --cpus-per-task=10
#SBATCH --error=/home/wangxiangtao/mae-main/output_dir/%j_0_log.err
#SBATCH --gpus-per-node=8
#SBATCH --job-name=mae
#SBATCH --mem=320GB
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=8
#SBATCH --open-mode=append
#SBATCH --output=/home/wangxiangtao/mae-main/output_dir/%j_0_log.out
#SBATCH --partition=learnfair
#SBATCH --signal=USR2@120
#SBATCH --time=4320
#SBATCH --wckey=submitit

Where should I change? Looking forward to ur reply!

xiangtaowong commented 2 years ago

The issue could be avoided by using main_pratrain.py directly, which means not using submitit or sbatch.