[Bug]: stable diffusion inside Docker crashes with error: ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

l0ggik commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I tried to use the rocm/pytorch Docker image but when starting stable-diffusion-webui following the install instructions i get the following error: ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

I tried downgrading to pytorch_lightning v1.7.7 and 1.6.5 but with no effect

Has anyone else this problem and know a solution?

Steps to reproduce the problem

Install stable-diffusion-webui with Docker

What should have happened?

webui should have started without error

Sysinfo

Linux Mint 20.1 Ulyssa RX 580

What browsers do you use to access the UI ?

No response

Console logs

Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
[GCC 9.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
Traceback (most recent call last):
  File "/dockerx/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/dockerx/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/dockerx/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/dockerx/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 33, in imports
    from modules import shared_init
  File "/dockerx/stable-diffusion-webui/modules/shared_init.py", line 5, in <module>
    from modules import shared
  File "/dockerx/stable-diffusion-webui/modules/shared.py", line 5, in <module>
    from modules import shared_cmd_options, shared_gradio_themes, options, shared_items, sd_models_types
  File "/dockerx/stable-diffusion-webui/modules/sd_models_types.py", line 1, in <module>
    from ldm.models.diffusion.ddpm import LatentDiffusion
  File "/dockerx/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 20, in <module>
    from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

Additional information

No response

gsteinLTU commented 1 year ago

I had to remove some extensions to get it working again after a similar error on a non-Docker install.

TheNexter commented 1 year ago

Same here using main or dev branch :+1:

6600 XT, ubuntu 23.10 using docker

no module 'xformers'. Processing without...
Traceback (most recent call last):
  File "/dockerx/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/dockerx/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/dockerx/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/dockerx/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 33, in imports
    from modules import shared_init
  File "/dockerx/stable-diffusion-webui/modules/shared_init.py", line 5, in <module>
    from modules import shared
  File "/dockerx/stable-diffusion-webui/modules/shared.py", line 5, in <module>
    from modules import shared_cmd_options, shared_gradio_themes, options, shared_items, sd_models_types
  File "/dockerx/stable-diffusion-webui/modules/sd_models_types.py", line 1, in <module>
    from ldm.models.diffusion.ddpm import LatentDiffusion
  File "/dockerx/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 20, in <module>
    from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

axxapy commented 11 months ago

running pip install pytorch-lightning==1.6.5 helped me

source

hchasens commented 9 months ago

Can confirm it's still present in docker when using w/ ROCm.

t3dc commented 6 months ago

Running into this same issue. As suggested installing pytorch-lighting==1.6.5 helped, but only partly. Then I recieved:

No module named 'timm'

Running pip install timm got me past that but only to another crash:

AttributeError: 'NoneType' object has no attribute '_id' Creating model from config: /dockerx/stable-diffusion-webui/configs/v1-inference.yaml /opt/conda/envs/py_3.9/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:resume_downloadis deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True. warnings.warn( loading stable diffusion model: RuntimeError Traceback (most recent call last): File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 937, in _bootstrap self._bootstrap_inner() File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 149, in load_model shared.sd_model # noqa: B018 File "/dockerx/stable-diffusion-webui/modules/shared_items.py", line 175, in sd_model return modules.sd_models.model_data.get_sd_model() File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 620, in get_sd_model load_model() File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 748, in load_model load_model_weights(sd_model, checkpoint_info, state_dict, timer) File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 393, in load_model_weights model.load_state_dict(state_dict, strict=False) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 223, in <lambda> module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs)) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 221, in load_state_dict original(module, state_dict, strict=strict) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2139, in load_state_dict load(self, state_dict) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 [Previous line repeated 1 more time] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2121, in load module._load_from_state_dict( File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 225, in <lambda> linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs)) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 191, in load_from_state_dict module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_meta_registrations.py", line 4820, in zeros_like res.fill_(0) RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3. Compile withTORCH_USE_HIP_DSAto enable device-side assertions.

Anyone fought their way past this? I'm on an RX 6750XT.

egorderg commented 1 month ago

What helped me was to upgrade the python version to 3.10 with the article in the wiki. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

EDIT: I could get it to work on a 6750XT with the following steps: Update to python 3.10 in the docker container. git clone the repo to dockerx but then ignore the next steps and follow the steps for 'Arch Linux' at 'Setup venv environment' in the wiki page. Works flawless.

AUTOMATIC1111 / stable-diffusion-webui