Open shencuifeng opened 1 year ago
I start training with this command 'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5' but I got this everything works fine, steps in one epoch are halved, but only one gpu is in use, and only started one process. How to solve this problem?
have you found a solution yet? I am facing the same issue
You can run like thie:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6 main.py --base your_config.yaml -t --gpus 0,1,2,3,4,5
--nproc_per_node=x x means the number of gpus
I start training with this command 'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5' but I got this
everything works fine, steps in one epoch are halved, but only one gpu is in use, and only started one process. How to solve this problem?