CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.82k stars 1.15k forks source link

NCCL error #227

Closed alex-pv01 closed 1 year ago

alex-pv01 commented 1 year ago

Hi, I am trying to train a vqgan on a custom dataset. I followed the instructions on the read.me file. However, as I run the command

python main.py --base configs/custom_vqgan.yaml -t True --gpus 0,1

I get the following error:

image

Take into account that I want to run this in a distributed server that has several RTX 3090.

Any suggestion on how to solve this?

Thx

alex-pv01 commented 1 year ago

I fixed it by reinstalling a different version of torch with command: pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

wwqy commented 8 months ago

I fixed it by reinstalling a different version of torch with command: pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

I tried this way, but cannot solve this bug. Is there any other methods?