CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.51k stars 1.5k forks source link

cannot run on multi gpus #209

Open shencuifeng opened 1 year ago

shencuifeng commented 1 year ago

I start training with this command 'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5' but I got this

image

everything works fine, steps in one epoch are halved, but only one gpu is in use, and only started one process. How to solve this problem?

OvO1111 commented 8 months ago

I start training with this command 'python main.py --base configs/autoencoder/vqmodel1.yaml -t --gpus 4,5' but I got this image everything works fine, steps in one epoch are halved, but only one gpu is in use, and only started one process. How to solve this problem?

have you found a solution yet? I am facing the same issue

newbie2niubility commented 2 months ago

You can run like thie: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6 main.py --base your_config.yaml -t --gpus 0,1,2,3,4,5

newbie2niubility commented 2 months ago

--nproc_per_node=x x means the number of gpus