Closed heerduo closed 4 years ago
Hey, for that you need to install and set-up slurm. That's for multi-GPU purpose, if you want to run this on single GPU, you don't need to write srun, simply write python [file_name] [arguments]
I use this command : python train_baseline_search_triplet.py --distributed True --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml
it occurs raise KeyError(key) from NONE KeyError:‘SLURM_PROCID’
If you have not set up slurm, then make sure that --distributed argument is False. Then try!
python train_baseline_search_triplet.py --distributed False --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml
I have changed --distributed False and use one GPU. It does not work.
Do you face the same error?
Yes
Can you tell me the line number?
I can not find line number in code
if args.distributed rank, world_size= Though args.distributed is False it also executes rank, world_size
I will delete codes in if args.distributed
Yaa or you can set those variables with some values..
OK
Hey, for that you need to install and set-up slurm. That's for multi-GPU purpose, if you want to run this on single GPU, you don't need to write srun, simply write python [file_name] [arguments]
Thank kaivanmehta very much for your notice. I really write this in a rush. Mostly the code was tested on slurm for distributed training. You need to set-up slurm to use srun.
when I run this command srun -n 128 --gres 1 -p 0.5 python train_baseline_search_triplet.py --distributed True --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml (I do not know what are the -n -p and the number of them)
it has this problem: bash: srun: command not found
how to fix it? Thanks