duanyiqun / Auto-ReID-Fast

A pytorch implementation of using DARTS to search better structure for Re-ID
53 stars 19 forks source link

Run error #3

Closed heerduo closed 4 years ago

heerduo commented 4 years ago

when I run this command srun -n 128 --gres 1 -p 0.5 python train_baseline_search_triplet.py --distributed True --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml (I do not know what are the -n -p and the number of them)

it has this problem: bash: srun: command not found

how to fix it? Thanks

kaivanmehta commented 4 years ago

Hey, for that you need to install and set-up slurm. That's for multi-GPU purpose, if you want to run this on single GPU, you don't need to write srun, simply write python [file_name] [arguments]

heerduo commented 4 years ago

I use this command : python train_baseline_search_triplet.py --distributed True --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml

it occurs raise KeyError(key) from NONE KeyError:‘SLURM_PROCID’

kaivanmehta commented 4 years ago

If you have not set up slurm, then make sure that --distributed argument is False. Then try!

heerduo commented 4 years ago

python train_baseline_search_triplet.py --distributed False --config configs/Retrieval_classification_DARTS_distributed_triplet.yaml

I have changed --distributed False and use one GPU. It does not work.

kaivanmehta commented 4 years ago

Do you face the same error?

heerduo commented 4 years ago

Yes

kaivanmehta commented 4 years ago

Can you tell me the line number?

heerduo commented 4 years ago

I can not find line number in code

heerduo commented 4 years ago

if args.distributed rank, world_size= Though args.distributed is False it also executes rank, world_size

I will delete codes in if args.distributed

kaivanmehta commented 4 years ago

Yaa or you can set those variables with some values..

heerduo commented 4 years ago

OK

duanyiqun commented 4 years ago

Hey, for that you need to install and set-up slurm. That's for multi-GPU purpose, if you want to run this on single GPU, you don't need to write srun, simply write python [file_name] [arguments]

Thank kaivanmehta very much for your notice. I really write this in a rush. Mostly the code was tested on slurm for distributed training. You need to set-up slurm to use srun.