CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.79k stars 1.15k forks source link

Dockerfile/environment.yaml for CUDA Version 12.3? #239

Closed kajc10 closed 8 months ago

kajc10 commented 8 months ago

I have CUDA Version 12.3 and therefore the given pytorch configurations will not work. I tried to adjust dependency versions, but could not create a working config setup. Could you help me out?

With the current environment.yaml I get stuck at 'initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1'

EDIT: solved by installing latest torch torchvision torchaudio and pillow==8.4.0

froestiago commented 7 months ago

Hi @kajc10 Do you mind sharing your yaml or docker file for this? I need to train the model on a custom dataset and I'm having big trouble trying to adjust dependency versions, specially regarding pytorch-lightining. Thanks!

Han1018 commented 6 months ago

Hi @kajc10 , I faced same issue and how do you solve that.

froestiago commented 6 months ago

Hi @Han1018 I was able to make it work without any docker file, only using conda envs

After running conda env create -f environment.yaml; conda activate taming I uninstalled pytorch (torch and torchvision) Then I installed the 1.8.1 + cu111 version pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html After that uninstalled pillow and reinstalled using pip install pillow==9.5.0. I do not remember very well but you might come across an error regarding torch._six, if you do, replace from torch._six import string_classes with string_classes = str (reference)

Hope this helps 🤗

Han1018 commented 6 months ago

Hi @froestiago, Thank you sososo much. It works for me and helps me save a lot of time !!!

senp98 commented 6 months ago

Hi @Han1018 I was able to make it work without any docker file, only using conda envs

After running conda env create -f environment.yaml; conda activate taming I uninstalled pytorch (torch and torchvision) Then I installed the 1.8.1 + cu111 version pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html After that uninstalled pillow and reinstalled using pip install pillow==9.5.0. I do not remember very well but you might come across an error regarding torch._six, if you do, replace from torch._six import string_classes with string_classes = str (reference)

Hope this helps 🤗

Somehow resolves my problem about the hanging training process. Thank you!!