AbdBarho / stable-diffusion-webui-docker

Easy Docker setup for Stable Diffusion with user-friendly UI
Other
6.71k stars 1.12k forks source link

python: undefined symbol: cudaRuntimeGetVersion (8.1.0) #578

Closed kienerj closed 1 year ago

kienerj commented 1 year ago

Has this issue been opened before?

Describe the bug

After update to 8.1.0 when starting the container I get below error:

python: undefined symbol: cudaRuntimeGetVersion

Which UI

auto

Hardware / Software

Steps to Reproduce

  1. git checkout 8.1.0
  2. docker compose --profile download up --build
  3. docker compose --profile auto up --build

Error happens on Step 3.

AbdBarho commented 1 year ago

do you have the nvidia container toolkit installed?

kienerj commented 1 year ago

do you have the nvidia container toolkit installed?

Yes, I merely upgraded from 8.0.0

EDIT:

There might be an issue with bitsandbytes:

https://github.com/huggingface/diffusers/issues/1207

The stacktrace also starts right after bitsandbytes installation:

bitsandbytes

Still I would assume it would fail for everyone if that where the case?

kienerj commented 1 year ago

Has any of the base containers been updated?

It now also fails when I downgrade to to 8.0.0 (git checkout 8.0.0) with same error

I can run basic CUDA container from NVIDIA just fine so nvidia container toolkit must be installed correctly

I'm not in full control of the environment, it's a VM so I'm limited to the host install and driver versions which should be updated soon but in essence for now I'm stuck at cuda 11.4. Therefore the question if possibly a base-container was updated to newer version and that is why also 8.0.0 fails because it uses same base container?

I can't find any other explanation.

AbdBarho commented 1 year ago

maybe your cuda version is the problem?

I don't think the change in the base container is the reason, we use a bare-bones python container that should not have an effect on the dependencies installed.

Can you try again from latest master commit?

since you mentioned you just upgraded, do you have any extensions installed? if so, can you try to remove them and run again?

dstout-devops commented 1 year ago

I can replicate this issue by installing the dreambooth extension, for what it's worth.

EDIT: This appears related to cuda and bitsandbytes. I was able to work around the issue by using the 'pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime' container as a base.

AbdBarho commented 1 year ago

thanks for the info @dstout-devops, I am considering replacing the base container, the performance difference between cuda 11.7 and 11.8 is huge fox RTX cards so I am waiting for their next release.

simonmcnair commented 1 year ago

I get this sometimes. It goes away if I rebuild the container.

eusonlito commented 1 year ago

Same here, my docker compose --profile auto up --build log:

[+] Building 1.6s (33/33) FINISHED                                                                                                                                                                                                             
 => [auto internal] load build definition from Dockerfile                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 4.28kB                                                                                                                                                                                                    0.0s
 => [auto internal] load .dockerignore                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                           0.0s
 => [auto internal] load metadata for docker.io/library/python:3.10.9-slim                                                                                                                                                                1.1s
 => [auto internal] load metadata for docker.io/library/alpine:3.17                                                                                                                                                                       0.9s
 => [auto internal] load metadata for docker.io/alpine/git:2.36.2                                                                                                                                                                         1.5s
 => [auto download 1/9] FROM docker.io/alpine/git:2.36.2@sha256:ec491c893597b68c92b88023827faa771772cfd5e106b76c713fa5e1c75dea84                                                                                                          0.0s
 => [auto stage-2  1/14] FROM docker.io/library/python:3.10.9-slim@sha256:76dd18d90a3d8710e091734bf2c9dd686d68747a51908db1e1f41e9a5ed4e2c5                                                                                                0.0s
 => [auto xformers 1/3] FROM docker.io/library/alpine:3.17@sha256:f71a5f071694a785e064f05fed657bf8277f1b2113a8ed70c90ad486d6ee54dc                                                                                                        0.0s
 => [auto internal] load build context                                                                                                                                                                                                    0.0s
 => => transferring context: 149B                                                                                                                                                                                                         0.0s
 => CACHED [auto stage-2  2/14] RUN --mount=type=cache,target=/var/cache/apt   apt-get update &&   apt-get install -y fonts-dejavu-core rsync git jq moreutils aria2   ffmpeg libglfw3-dev libgles2-mesa-dev pkg-config libcairo2 libcai  0.0s
 => CACHED [auto stage-2  3/14] RUN --mount=type=cache,target=/cache --mount=type=cache,target=/root/.cache/pip   aria2c -x 5 --dir /cache --out torch-2.0.1-cp310-cp310-linux_x86_64.whl -c   https://download.pytorch.org/whl/cu118/to  0.0s
 => CACHED [auto stage-2  4/14] RUN --mount=type=cache,target=/root/.cache/pip   git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git &&   cd stable-diffusion-webui &&   git reset --hard 20ae71faa8ef035c31aa3a410b70  0.0s
 => CACHED [auto xformers 2/3] RUN apk add --no-cache aria2                                                                                                                                                                               0.0s
 => CACHED [auto xformers 3/3] RUN aria2c -x 5 --dir / --out wheel.whl 'https://github.com/AbdBarho/stable-diffusion-webui-docker/releases/download/6.0.0/xformers-0.0.21.dev544-cp310-cp310-manylinux2014_x86_64-pytorch201.whl'         0.0s
 => CACHED [auto stage-2  5/14] RUN --mount=type=cache,target=/root/.cache/pip    --mount=type=bind,from=xformers,source=/wheel.whl,target=/xformers-0.0.21.dev544-cp310-cp310-manylinux2014_x86_64.whl   pip install /xformers-0.0.21.d  0.0s
 => CACHED [auto download 2/9] COPY clone.sh /clone.sh                                                                                                                                                                                    0.0s
 => CACHED [auto download 3/9] RUN . /clone.sh taming-transformers https://github.com/CompVis/taming-transformers.git 24268930bf1dce879235a7fddd0b2355b84d7ea6   && rm -rf data assets **/*.ipynb                                         0.0s
 => CACHED [auto download 4/9] RUN . /clone.sh stable-diffusion-stability-ai https://github.com/Stability-AI/stablediffusion.git 47b6b607fdd31875c9279cd2f4f16b92e4ea958e   && rm -rf assets data/**/*.png data/**/*.jpg data/**/*.gif    0.0s
 => CACHED [auto download 5/9] RUN . /clone.sh CodeFormer https://github.com/sczhou/CodeFormer.git c5b4593074ba6214284d6acd5f1719b6c5d739af   && rm -rf assets inputs                                                                     0.0s
 => CACHED [auto download 6/9] RUN . /clone.sh BLIP https://github.com/salesforce/BLIP.git 48211a1594f1321b00f14c9f7a5b4813144b2fb9                                                                                                       0.0s
 => CACHED [auto download 7/9] RUN . /clone.sh k-diffusion https://github.com/crowsonkb/k-diffusion.git c9fe758757e022f05ca5a53fa8fac28889e4f1cf                                                                                          0.0s
 => CACHED [auto download 8/9] RUN . /clone.sh clip-interrogator https://github.com/pharmapsychotic/clip-interrogator 2486589f24165c8e3b303f84e9dbbea318df83e8                                                                            0.0s
 => CACHED [auto download 9/9] RUN . /clone.sh generative-models https://github.com/Stability-AI/generative-models 5c10deee76adad0032b412294130090932317a87                                                                               0.0s
 => CACHED [auto stage-2  6/14] COPY --from=download /repositories/ /stable-diffusion-webui/repositories/                                                                                                                                 0.0s
 => CACHED [auto stage-2  7/14] RUN mkdir /stable-diffusion-webui/interrogate && cp /stable-diffusion-webui/repositories/clip-interrogator/data/* /stable-diffusion-webui/interrogate                                                     0.0s
 => CACHED [auto stage-2  8/14] RUN --mount=type=cache,target=/root/.cache/pip   pip install -r /stable-diffusion-webui/repositories/CodeFormer/requirements.txt                                                                          0.0s
 => CACHED [auto stage-2  9/14] RUN --mount=type=cache,target=/root/.cache/pip   pip install pyngrok   git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379   git+https://github.com/openai/CLIP.git@d5  0.0s
 => CACHED [auto stage-2 10/14] RUN apt-get -y install libgoogle-perftools-dev && apt-get clean                                                                                                                                           0.0s
 => CACHED [auto stage-2 11/14] RUN --mount=type=cache,target=/root/.cache/pip   cd stable-diffusion-webui &&   git fetch &&   git reset --hard c9c8485bc1e8720aba70f029d25cba1c4abf2b5c &&   pip install -r requirements_versions.txt    0.0s
 => CACHED [auto stage-2 12/14] COPY . /docker                                                                                                                                                                                            0.0s
 => CACHED [auto stage-2 13/14] RUN   python3 /docker/info.py /stable-diffusion-webui/modules/ui.py &&   mv /stable-diffusion-webui/style.css /stable-diffusion-webui/user.css &&   sed -i 's/in_app_dir = .*/in_app_dir = True/g' /usr/  0.0s
 => CACHED [auto stage-2 14/14] WORKDIR /stable-diffusion-webui                                                                                                                                                                           0.0s
 => [auto] exporting to image                                                                                                                                                                                                             0.0s
 => => exporting layers                                                                                                                                                                                                                   0.0s
 => => writing image sha256:b946a6444675e8307df64d221bd12cc1b98b5998d36868d39890d214a99e6736                                                                                                                                              0.0s
 => => naming to docker.io/library/sd-auto:65                                                                                                                                                                                             0.0s
[+] Running 1/0
 ✔ Container webui-docker-auto-1  Created                                                                                                                                                                                                 0.0s 
Attaching to webui-docker-auto-1
webui-docker-auto-1  | Mounted .cache
webui-docker-auto-1  | Mounted config_states
webui-docker-auto-1  | Mounted .cache
webui-docker-auto-1  | Mounted embeddings
webui-docker-auto-1  | Mounted config.json
webui-docker-auto-1  | Mounted models
webui-docker-auto-1  | Mounted styles.csv
webui-docker-auto-1  | Mounted ui-config.json
webui-docker-auto-1  | Mounted extensions
webui-docker-auto-1  | Installing extension dependencies (if any)
webui-docker-auto-1  | If submitting an issue on github, please provide the full startup log for debugging purposes.
webui-docker-auto-1  | 
webui-docker-auto-1  | Initializing Dreambooth
webui-docker-auto-1  | Dreambooth revision: cf086c536b141fc522ff11f6cffc8b7b12da04b9
webui-docker-auto-1  | [+] xformers version 0.0.21.dev544 installed.
webui-docker-auto-1  | [+] torch version 2.0.1+cu118 installed.
webui-docker-auto-1  | [+] torchvision version 0.15.2+cu118 installed.
webui-docker-auto-1  | [+] accelerate version 0.21.0 installed.
webui-docker-auto-1  | [+] diffusers version 0.19.3 installed.
webui-docker-auto-1  | [+] transformers version 4.30.2 installed.
webui-docker-auto-1  | [+] bitsandbytes version 0.35.4 installed.
webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1086, in _get_module
webui-docker-auto-1  |     return importlib.import_module("." + module_name, self.__name__)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
webui-docker-auto-1  |     return _bootstrap._gcd_import(name[level:], package, level)
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
webui-docker-auto-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 85, in <module>
webui-docker-auto-1  |     from accelerate import __version__ as accelerate_version
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
webui-docker-auto-1  |     from .accelerator import Accelerator
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
webui-docker-auto-1  |     from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
webui-docker-auto-1  |     from .utils import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
webui-docker-auto-1  |     from .bnb import has_4bit_bnb_layers, load_and_quantize_model
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
webui-docker-auto-1  |     import bitsandbytes as bnb
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
webui-docker-auto-1  |     from .autograd._functions import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
webui-docker-auto-1  |     import bitsandbytes.functional as F
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 13, in <module>
webui-docker-auto-1  |     from .cextension import COMPILED_WITH_CUDA, lib
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 113, in <module>
webui-docker-auto-1  |     lib = CUDASetup.get_instance().lib
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 109, in get_instance
webui-docker-auto-1  |     cls._instance.initialize()
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 59, in initialize
webui-docker-auto-1  |     binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 125, in evaluate_cuda_setup
webui-docker-auto-1  |     cuda_version_string = get_cuda_version(cuda, cudart_path)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 45, in get_cuda_version
webui-docker-auto-1  |     check_cuda_result(cuda, cudart.cudaRuntimeGetVersion(ctypes.byref(version)))
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
webui-docker-auto-1  |     func = self.__getitem__(name)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
webui-docker-auto-1  |     func = self._FuncPtr((name_or_ordinal, self))
webui-docker-auto-1  | AttributeError: python: undefined symbol: cudaRuntimeGetVersion
webui-docker-auto-1  | 
webui-docker-auto-1  | 
webui-docker-auto-1  | The above exception was the direct cause of the following exception:
webui-docker-auto-1  | 
webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/stable-diffusion-webui/webui.py", line 39, in <module>
webui-docker-auto-1  |     import pytorch_lightning   # noqa: F401 # pytorch_lightning should be imported after torch, but it re-enables warnings on import so import once to disable them
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/__init__.py", line 35, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks import Callback  # noqa: E402
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks.batch_size_finder import BatchSizeFinder
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/batch_size_finder.py", line 24, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks.callback import Callback
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/callback.py", line 25, in <module>
webui-docker-auto-1  |     from pytorch_lightning.utilities.types import STEP_OUTPUT
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/utilities/types.py", line 27, in <module>
webui-docker-auto-1  |     from torchmetrics import Metric
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/__init__.py", line 14, in <module>
webui-docker-auto-1  |     from torchmetrics import functional  # noqa: E402
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/__init__.py", line 120, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text._deprecated import _bleu_score as bleu_score
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/__init__.py", line 50, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text.bert import bert_score  # noqa: F401
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/bert.py", line 23, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text.helper_embedding_metric import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/helper_embedding_metric.py", line 27, in <module>
webui-docker-auto-1  |     from transformers import AutoModelForMaskedLM, AutoTokenizer, PreTrainedModel, PreTrainedTokenizerBase
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
webui-docker-auto-1  | 
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1076, in __getattr__
webui-docker-auto-1  |     module = self._get_module(self._class_to_module[name])
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1088, in _get_module
webui-docker-auto-1  |     raise RuntimeError(
webui-docker-auto-1  | RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
webui-docker-auto-1  | python: undefined symbol: cudaRuntimeGetVersion
webui-docker-auto-1 exited with code 1
eusonlito commented 1 year ago

Same without dreambooth extension:

[+] Building 0.6s (33/33) FINISHED                                                                                                                                                                                                             
 => [auto internal] load build definition from Dockerfile                                                                                                                                                                                 0.0s
 => => transferring dockerfile: 4.28kB                                                                                                                                                                                                    0.0s
 => [auto internal] load .dockerignore                                                                                                                                                                                                    0.0s
 => => transferring context: 2B                                                                                                                                                                                                           0.0s
 => [auto internal] load metadata for docker.io/library/python:3.10.9-slim                                                                                                                                                                0.5s
 => [auto internal] load metadata for docker.io/library/alpine:3.17                                                                                                                                                                       0.5s
 => [auto internal] load metadata for docker.io/alpine/git:2.36.2                                                                                                                                                                         0.5s
 => [auto internal] load build context                                                                                                                                                                                                    0.0s
 => => transferring context: 149B                                                                                                                                                                                                         0.0s
 => [auto stage-2  1/14] FROM docker.io/library/python:3.10.9-slim@sha256:76dd18d90a3d8710e091734bf2c9dd686d68747a51908db1e1f41e9a5ed4e2c5                                                                                                0.0s
 => [auto download 1/9] FROM docker.io/alpine/git:2.36.2@sha256:ec491c893597b68c92b88023827faa771772cfd5e106b76c713fa5e1c75dea84                                                                                                          0.0s
 => [auto xformers 1/3] FROM docker.io/library/alpine:3.17@sha256:f71a5f071694a785e064f05fed657bf8277f1b2113a8ed70c90ad486d6ee54dc                                                                                                        0.0s
 => CACHED [auto stage-2  2/14] RUN --mount=type=cache,target=/var/cache/apt   apt-get update &&   apt-get install -y fonts-dejavu-core rsync git jq moreutils aria2   ffmpeg libglfw3-dev libgles2-mesa-dev pkg-config libcairo2 libcai  0.0s
 => CACHED [auto stage-2  3/14] RUN --mount=type=cache,target=/cache --mount=type=cache,target=/root/.cache/pip   aria2c -x 5 --dir /cache --out torch-2.0.1-cp310-cp310-linux_x86_64.whl -c   https://download.pytorch.org/whl/cu118/to  0.0s
 => CACHED [auto stage-2  4/14] RUN --mount=type=cache,target=/root/.cache/pip   git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git &&   cd stable-diffusion-webui &&   git reset --hard 20ae71faa8ef035c31aa3a410b70  0.0s
 => CACHED [auto xformers 2/3] RUN apk add --no-cache aria2                                                                                                                                                                               0.0s
 => CACHED [auto xformers 3/3] RUN aria2c -x 5 --dir / --out wheel.whl 'https://github.com/AbdBarho/stable-diffusion-webui-docker/releases/download/6.0.0/xformers-0.0.21.dev544-cp310-cp310-manylinux2014_x86_64-pytorch201.whl'         0.0s
 => CACHED [auto stage-2  5/14] RUN --mount=type=cache,target=/root/.cache/pip    --mount=type=bind,from=xformers,source=/wheel.whl,target=/xformers-0.0.21.dev544-cp310-cp310-manylinux2014_x86_64.whl   pip install /xformers-0.0.21.d  0.0s
 => CACHED [auto download 2/9] COPY clone.sh /clone.sh                                                                                                                                                                                    0.0s
 => CACHED [auto download 3/9] RUN . /clone.sh taming-transformers https://github.com/CompVis/taming-transformers.git 24268930bf1dce879235a7fddd0b2355b84d7ea6   && rm -rf data assets **/*.ipynb                                         0.0s
 => CACHED [auto download 4/9] RUN . /clone.sh stable-diffusion-stability-ai https://github.com/Stability-AI/stablediffusion.git 47b6b607fdd31875c9279cd2f4f16b92e4ea958e   && rm -rf assets data/**/*.png data/**/*.jpg data/**/*.gif    0.0s
 => CACHED [auto download 5/9] RUN . /clone.sh CodeFormer https://github.com/sczhou/CodeFormer.git c5b4593074ba6214284d6acd5f1719b6c5d739af   && rm -rf assets inputs                                                                     0.0s
 => CACHED [auto download 6/9] RUN . /clone.sh BLIP https://github.com/salesforce/BLIP.git 48211a1594f1321b00f14c9f7a5b4813144b2fb9                                                                                                       0.0s
 => CACHED [auto download 7/9] RUN . /clone.sh k-diffusion https://github.com/crowsonkb/k-diffusion.git c9fe758757e022f05ca5a53fa8fac28889e4f1cf                                                                                          0.0s
 => CACHED [auto download 8/9] RUN . /clone.sh clip-interrogator https://github.com/pharmapsychotic/clip-interrogator 2486589f24165c8e3b303f84e9dbbea318df83e8                                                                            0.0s
 => CACHED [auto download 9/9] RUN . /clone.sh generative-models https://github.com/Stability-AI/generative-models 5c10deee76adad0032b412294130090932317a87                                                                               0.0s
 => CACHED [auto stage-2  6/14] COPY --from=download /repositories/ /stable-diffusion-webui/repositories/                                                                                                                                 0.0s
 => CACHED [auto stage-2  7/14] RUN mkdir /stable-diffusion-webui/interrogate && cp /stable-diffusion-webui/repositories/clip-interrogator/data/* /stable-diffusion-webui/interrogate                                                     0.0s
 => CACHED [auto stage-2  8/14] RUN --mount=type=cache,target=/root/.cache/pip   pip install -r /stable-diffusion-webui/repositories/CodeFormer/requirements.txt                                                                          0.0s
 => CACHED [auto stage-2  9/14] RUN --mount=type=cache,target=/root/.cache/pip   pip install pyngrok   git+https://github.com/TencentARC/GFPGAN.git@8d2447a2d918f8eba5a4a01463fd48e45126a379   git+https://github.com/openai/CLIP.git@d5  0.0s
 => CACHED [auto stage-2 10/14] RUN apt-get -y install libgoogle-perftools-dev && apt-get clean                                                                                                                                           0.0s
 => CACHED [auto stage-2 11/14] RUN --mount=type=cache,target=/root/.cache/pip   cd stable-diffusion-webui &&   git fetch &&   git reset --hard c9c8485bc1e8720aba70f029d25cba1c4abf2b5c &&   pip install -r requirements_versions.txt    0.0s
 => CACHED [auto stage-2 12/14] COPY . /docker                                                                                                                                                                                            0.0s
 => CACHED [auto stage-2 13/14] RUN   python3 /docker/info.py /stable-diffusion-webui/modules/ui.py &&   mv /stable-diffusion-webui/style.css /stable-diffusion-webui/user.css &&   sed -i 's/in_app_dir = .*/in_app_dir = True/g' /usr/  0.0s
 => CACHED [auto stage-2 14/14] WORKDIR /stable-diffusion-webui                                                                                                                                                                           0.0s
 => [auto] exporting to image                                                                                                                                                                                                             0.0s
 => => exporting layers                                                                                                                                                                                                                   0.0s
 => => writing image sha256:b946a6444675e8307df64d221bd12cc1b98b5998d36868d39890d214a99e6736                                                                                                                                              0.0s
 => => naming to docker.io/library/sd-auto:65                                                                                                                                                                                             0.0s
[+] Running 1/0
 ✔ Container webui-docker-auto-1  Created                                                                                                                                                                                                 0.0s 
Attaching to webui-docker-auto-1
webui-docker-auto-1  | Mounted .cache
webui-docker-auto-1  | Mounted config_states
webui-docker-auto-1  | Mounted .cache
webui-docker-auto-1  | Mounted embeddings
webui-docker-auto-1  | Mounted config.json
webui-docker-auto-1  | Mounted models
webui-docker-auto-1  | Mounted styles.csv
webui-docker-auto-1  | Mounted ui-config.json
webui-docker-auto-1  | Mounted extensions
webui-docker-auto-1  | Installing extension dependencies (if any)
webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1086, in _get_module
webui-docker-auto-1  |     return importlib.import_module("." + module_name, self.__name__)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
webui-docker-auto-1  |     return _bootstrap._gcd_import(name[level:], package, level)
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
webui-docker-auto-1  |   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 85, in <module>
webui-docker-auto-1  |     from accelerate import __version__ as accelerate_version
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/__init__.py", line 3, in <module>
webui-docker-auto-1  |     from .accelerator import Accelerator
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in <module>
webui-docker-auto-1  |     from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in <module>
webui-docker-auto-1  |     from .utils import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/utils/__init__.py", line 131, in <module>
webui-docker-auto-1  |     from .bnb import has_4bit_bnb_layers, load_and_quantize_model
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/accelerate/utils/bnb.py", line 42, in <module>
webui-docker-auto-1  |     import bitsandbytes as bnb
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
webui-docker-auto-1  |     from .autograd._functions import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 5, in <module>
webui-docker-auto-1  |     import bitsandbytes.functional as F
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/functional.py", line 13, in <module>
webui-docker-auto-1  |     from .cextension import COMPILED_WITH_CUDA, lib
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 113, in <module>
webui-docker-auto-1  |     lib = CUDASetup.get_instance().lib
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 109, in get_instance
webui-docker-auto-1  |     cls._instance.initialize()
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 59, in initialize
webui-docker-auto-1  |     binary_name, cudart_path, cuda, cc, cuda_version_string = evaluate_cuda_setup()
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 125, in evaluate_cuda_setup
webui-docker-auto-1  |     cuda_version_string = get_cuda_version(cuda, cudart_path)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py", line 45, in get_cuda_version
webui-docker-auto-1  |     check_cuda_result(cuda, cudart.cudaRuntimeGetVersion(ctypes.byref(version)))
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
webui-docker-auto-1  |     func = self.__getitem__(name)
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
webui-docker-auto-1  |     func = self._FuncPtr((name_or_ordinal, self))
webui-docker-auto-1  | AttributeError: python: undefined symbol: cudaRuntimeGetVersion
webui-docker-auto-1  | 
webui-docker-auto-1  | 
webui-docker-auto-1  | The above exception was the direct cause of the following exception:
webui-docker-auto-1  | 
webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/stable-diffusion-webui/webui.py", line 39, in <module>
webui-docker-auto-1  |     import pytorch_lightning   # noqa: F401 # pytorch_lightning should be imported after torch, but it re-enables warnings on import so import once to disable them
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/__init__.py", line 35, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks import Callback  # noqa: E402
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks.batch_size_finder import BatchSizeFinder
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/batch_size_finder.py", line 24, in <module>
webui-docker-auto-1  |     from pytorch_lightning.callbacks.callback import Callback
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/callbacks/callback.py", line 25, in <module>
webui-docker-auto-1  |     from pytorch_lightning.utilities.types import STEP_OUTPUT
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/pytorch_lightning/utilities/types.py", line 27, in <module>
webui-docker-auto-1  |     from torchmetrics import Metric
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/__init__.py", line 14, in <module>
webui-docker-auto-1  |     from torchmetrics import functional  # noqa: E402
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/__init__.py", line 120, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text._deprecated import _bleu_score as bleu_score
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/__init__.py", line 50, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text.bert import bert_score  # noqa: F401
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/bert.py", line 23, in <module>
webui-docker-auto-1  |     from torchmetrics.functional.text.helper_embedding_metric import (
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/torchmetrics/functional/text/helper_embedding_metric.py", line 27, in <module>
webui-docker-auto-1  |     from transformers import AutoModelForMaskedLM, AutoTokenizer, PreTrainedModel, PreTrainedTokenizerBase
webui-docker-auto-1  |   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1076, in __getattr__
webui-docker-auto-1  |     module = self._get_module(self._class_to_module[name])
webui-docker-auto-1  |   File "/usr/local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1088, in _get_module
webui-docker-auto-1  |     raise RuntimeError(
webui-docker-auto-1  | RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
webui-docker-auto-1  | python: undefined symbol: cudaRuntimeGetVersion
webui-docker-auto-1 exited with code 1
kienerj commented 1 year ago

How can I delete extension when I can't start the container? Just remove the folder?

simonmcnair commented 1 year ago

How can I delete extension when I can't start the container? Just remove the folder?

yes

kienerj commented 1 year ago

It did change the error I get now:

RuntimeError: No CUDA GPUs are available

So I removed all images and containers and regenerated and still get this same error. A basic cuda container from nvidia works fine. So a gpu is actually available incl nvidia container toolkit. CUDA also works directly without docker.

simonmcnair commented 1 year ago

It is important to consider, and remember, that once you install an extension, and its requirements/python packages have been installed in to that container that it has the capability to break the container as the packages and changes will persist. If you delete an extension, you really should rebuild the container if you have any issues.

I have had the CUDA error loads of times, and it has always been an extension, and a rebuild has always fixed it.

I have given up on the dreambooth extension tbh. I now use a colab instead.

DevilaN commented 1 year ago

What's up with the rebuilding image? It should be not necessary at all. What you need it to get rid of old container that has unwanted modifications (made by rouge extensions). Just go with: docker compose --profile auto down and then start as normal.

simonmcnair commented 1 year ago

I'm sure you're right, I just remember seeing, in the past, in the log files messages saying packages were 'already installed' when I'm pretty sure they didn't make up part of the default container.

Either way, if there are CUDA errors, try moving the extensions to a different directory and try and figure out which one is causing it.

kienerj commented 1 year ago

I cleaned all images and pulled the git repo from scratch. eg started fresh. Still get the error. So it's not just dependencies.

dstout-devops commented 1 year ago

I cleaned all images and pulled the git repo from scratch. eg started fresh. Still get the error. So it's not just dependencies.

in my case, deleting the sd-dreambooth extension out of you data/extensions dir resolves this. it could be another extension, but that's the most popular one that will use 'bitsandbytes' and produce this error.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

hinate13 commented 1 year ago

I ran into this issue as well. I was able to remove the dream booth install to get Auto container back online. However, curious if there's a current workaround (to get DreamBooth working or recommended alt comparable extension)?

I tried following @dstout-devops idea:

I can replicate this issue by installing the dreambooth extension, for what it's worth.

EDIT: This appears related to cuda and bitsandbytes. I was able to work around the issue by using the 'pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime' container as a base.

Changing services\AUTOMATIC1111\Dockerfile:

FROM python:3.10.9-slim to:

FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

after rebuilding the container, it fails with a different error:

 => ERROR [auto stage-2 13/14] RUN   python3 /docker/info.py /stable-diffusion-webui/modules/ui.py &&   mv /stable-diffusion-webui/style.css /stable-dif  0.5s
------
 > [auto stage-2 13/14] RUN   python3 /docker/info.py /stable-diffusion-webui/modules/ui.py &&   mv /stable-diffusion-webui/style.css /stable-diffusion-webui/user.css &&   sed -i 's/in_app_dir = .*/in_app_dir = True/g' /usr/local/lib/python3.10/site-packages/gradio/routes.py &&   git config --global --add safe.directory '*':
0.455 Traceback (most recent call last):
0.455   File "/docker/info.py", line 6, in <module>
0.455     file.read_text()\
0.455   File "/opt/conda/lib/python3.10/pathlib.py", line 1134, in read_text
0.455     with self.open(mode='r', encoding=encoding, errors=errors) as f:
0.455   File "/opt/conda/lib/python3.10/pathlib.py", line 1119, in open
0.455     return self._accessor.open(self, mode, buffering, encoding, errors,
0.455 FileNotFoundError: [Errno 2] No such file or directory: '/stable-diffusion-webui/modules/ui.py'
------
failed to solve: process "/bin/sh -c python3 /docker/info.py ${ROOT}/modules/ui.py &&   mv ${ROOT}/style.css ${ROOT}/user.css &&   sed -i 's/in_app_dir = .*/in_app_dir = True/g' /usr/local/lib/python3.10/site-packages/gradio/routes.py &&   git config --global --add safe.directory '*'" did not complete successfully: exit code: 1
paranoidd commented 1 year ago

To anyone who encounters this issue in combination with the dreambooth extension (https://github.com/d8ahazard/sd_dreambooth_extension) - you are not alone, here's what helped me:

  1. Installing newer bitsandbytes version (per https://github.com/d8ahazard/sd_dreambooth_extension/issues/1326#issuecomment-1694685134)

    diff --git a/postinstall.py b/postinstall.py
    index 5ff5d26..99efd5d 100644
    --- a/postinstall.py
    +++ b/postinstall.py
    @@ -149,7 +149,7 @@ def check_versions():
         Dependency(module="accelerate", version="0.17.1"),
         Dependency(module="diffusers", version="0.14.0"),
         Dependency(module="transformers", version="4.25.1"),
    -        Dependency(module="bitsandbytes",  version="0.35.4", version_comparison="exact"),
    +        Dependency(module="bitsandbytes",  version="0.41.1", version_comparison="exact"),
     ]
    
     launch_errors = []
    diff --git a/requirements.txt b/requirements.txt
    index 451739e..0a40de8 100644
    --- a/requirements.txt
    +++ b/requirements.txt
    @@ -1,5 +1,5 @@
    accelerate~=0.21.0
    -bitsandbytes==0.35.4
    +bitsandbytes==0.41.1
    dadaptation==3.1
    diffusers~=0.19.3
    discord-webhook~=1.1.0
  2. Install cuda toolkit inside of the container - some crucial library was missing. I added this to the stage that's already building the final image:

    RUN apt-get update && \
    apt-get install -y software-properties-common wget && \
    wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.1-1_all.deb -O /tmp/cuda.deb && \
    dpkg -i /tmp/cuda.deb && \
    rm /tmp/cuda.deb && \
    add-apt-repository contrib && \
    apt-get update && \
    apt-get -y install cuda && \
    apt-get clean
  3. Set BNB_CUDA_VERSION=122 environment variable (I am not running it through compose but adding it here should be enough: https://github.com/AbdBarho/stable-diffusion-webui-docker/blob/master/docker-compose.yml#L33)


I build docker image with:

docker compose build auto

My final runtime command looks like:

docker run -it --rm --name automatic1111 \
  -e CLI_ARGS="--allow-code --medvram --xformers --enable-insecure-extension-access --api" \
  -e BNB_CUDA_VERSION=122 \
  -v ./data:/data \
  -v ./output:/output \
  -p 7860:7860 \
  --gpus all \
  docker.io/library/sd-auto:67

Hope this spares someone the hassle I went through past days :laughing:

hinate13 commented 1 year ago

Hi @paranoidd , I cleared out all the extensions from auto1111 folder (data\config\auto\extensions) and tried the changes you suggested but I'm running into some odd nvidia error when the container tries to start:


 => => naming to docker.io/library/sd-auto:67                                                                      0.0s
[+] Running 2/2
 ✔ Network webui-docker_default   Created                                                                          0.1s
 ✔ Container webui-docker-auto-1  Created                                                                          0.1s
Attaching to webui-docker-auto-1
Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/ecb28cfcf3bf29c15dc6c6ffeeaa4d9b0f443499e11c87c52d486fa9e0993cff/merged/usr/bin/nvidia-smi: file exists: unknown

Unfortunately, I'm fairly new to the nvidia setup requirements for docker so I'm a little clueless as to where to go from here :/

Here's the git diff of the stable-diffusion-webui-docker:

diff --git a/docker-compose.yml b/docker-compose.yml
index 93fba1d..f20c1db 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -32,6 +32,7 @@ services:
     image: sd-auto:67
     environment:
       - CLI_ARGS=--allow-code --medvram --xformers --enable-insecure-extension-access --api
+      - BNB_CUDA_VERSION=122

   auto-cpu:
:...skipping...
diff --git a/docker-compose.yml b/docker-compose.yml
index 93fba1d..f20c1db 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -32,6 +32,7 @@ services:
diff --git a/docker-compose.yml b/docker-compose.yml
index 93fba1d..f20c1db 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -32,6 +32,7 @@ services:
     image: sd-auto:67
     environment:
       - CLI_ARGS=--allow-code --medvram --xformers --enable-insecure-extension-access --api
+      - BNB_CUDA_VERSION=122

   auto-cpu:
     <<: *automatic
diff --git a/services/AUTOMATIC1111/Dockerfile b/services/AUTOMATIC1111/Dockerfile
index f380f30..17ee641 100644
--- a/services/AUTOMATIC1111/Dockerfile
+++ b/services/AUTOMATIC1111/Dockerfile
@@ -31,6 +31,15 @@ RUN --mount=type=cache,target=/var/cache/apt \
   # extensions needs those
   ffmpeg libglfw3-dev libgles2-mesa-dev pkg-config libcairo2 libcairo2-dev build-essential

+RUN apt-get update && \
+    apt-get install -y software-properties-common wget && \
+    wget https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.1-1_all.deb -O /tmp/cuda.deb && \
+    dpkg -i /tmp/cuda.deb && \
+    rm /tmp/cuda.deb && \
+    add-apt-repository contrib && \
+    apt-get update && \
+    apt-get -y install cuda && \
+    apt-get clean

 RUN --mount=type=cache,target=/cache --mount=type=cache,target=/root/.cache/pip \
   aria2c -x 5 --dir /cache --out torch-2.0.1-cp310-cp310-linux_x86_64.whl -c \

Note I didn't even get to modify the postinstall.py for bitsandbytes dependency inside dreambooth extension.. I did try to delete all my images and rebuild the container, but it had the same results.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 7 days with no activity.