csslc / CCSR

Official codes of CCSR: Improving the Stability of Diffusion Models for Content Consistent Super-Resolution
https://csslc.github.io/project-CCSR/
390 stars 30 forks source link

librairy pytorch_lightning.utilities.distributed problem #10

Open pierre1618 opened 5 months ago

pierre1618 commented 5 months ago

Issue Description

Hi,

After creating the ccrsr virtual environment and running python3 inference_ccsr.py, I encountered the following issue:

ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

Environment Information

Resolution

To resolve the issue, I made the following modification in the code:

In CCSR/ldm/models/diffusion/ddpm_ccsr_stage2.py and /home/pierre/CCSR/ldm/models/diffusion/ddpm_ccsr_stage1.py, I changed:

from pytorch_lightning.utilities.distributed import rank_zero_only

to:

from pytorch_lightning.utilities.rank_zero import rank_zero_only

This modification allowed me to make it work.

Steps to Reproduce

  1. Create ccrsr virtual environment.
  2. Run python3 inference_ccsr.py.
Limbicnation commented 5 months ago

I've encountered an additional error that I suspect might be related to the previously mentioned issue: OSError: /root/miniconda3/envs/ccsr/lib/python3.9/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

I've also received this error after installing pytorch with: pip3 install torch torchvision torchaudio

from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'
csslc commented 5 months ago

Hi, the problem seems to be caused by the version of pytorch_lightning. My environment information: pytorch-lightning Version: 1.4.2, torch Version: 2.0.1+cu118, Python Version: 3.10.10. You can re-install the corresponding version to see if the problem can be resolved.

Limbicnation commented 5 months ago

Hi @csslc

Thanks for your fast reply. I have installed PyTorch Lightning with pip install pytorch-lightning==1.4.2 and PyTorch with conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia in a Python 1.10.10 environment. However, I am still encountering the following error:

ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/mnt///anaconda3/envs/ccsr/lib/python3.10/site-packages/torchmetrics/utilities/data.py)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Any assistance would be greatly appreciated.

Limbicnation commented 5 months ago

I believe the following steps resolved the issue for me:

pip uninstall torchmetrics pip install torchmetrics==0.7

csslc commented 5 months ago

The versions of torchmetrics and torchvision are 0.6.0 and 0.15.2+cu118. You can try this setting.

Limbicnation commented 5 months ago

I could resolved the issue successfully by executing the following commands:

pip uninstall pytorch-lightning torch torchvision torchmetrics

pip install torch==2.0.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchmetrics==0.6.0 pip install pytorch-lightning==1.4.2

pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118