Closed dmarx closed 1 year ago
let's see if this works
EDIT: yeah... don't do this. that torch version is pinned for a reason.
new error now after unpinning torch:
=> [stage-2 15/15] WORKDIR /stable-diffusion-webui 0.0s
=> exporting to image 21.1s
=> => exporting layers 21.1s
=> => writing image sha256:500eb74eac4bb4c9d06516f9f971fdbee75013b509c002666788f73fbe08b742 0.0s
=> => naming to docker.io/library/sd-auto:51 0.0s
[+] Running 1/1
✔ Container webui-docker-auto-1 Created 0.2s
Attaching to webui-docker-auto-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
i think this means I'm missing my cuda drivers?
confirmed... I didn't have my cuda stuff configured :/
For posterity:
nvidia-smi works properly, so does the hello-world nvidia docker container. still getting the same error :(
deleted and rebuilt containers and images, still no luck.
=> => exporting layers 0.0s
=> => writing image sha256:500eb74eac4bb4c9d06516f9f971fdbee75013b509c002666788f73fbe08b742 0.0s
=> => naming to docker.io/library/sd-auto:51 0.0s
[+] Running 1/0
✔ Container webui-docker-auto-1 Created 0.0s
Attaching to webui-docker-auto-1
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
tried sudo-ing the command, seems to have at least gotten past the previous error. believe the root of the problem is discussed here: https://github.com/NVIDIA/nvidia-container-toolkit/issues/154
services build, getting an error when trying to run a test prompt with everything else set to defaults...
webui-docker-auto-1 | Running on local URL: http://0.0.0.0:7860
webui-docker-auto-1 |
webui-docker-auto-1 | To create a public link, set `share=True` in `launch()`.
webui-docker-auto-1 | Startup time: 13.9s (import gradio: 0.8s, import ldm: 0.4s, other imports: 1.2s, load scripts: 0.2s, load SD checkpoint: 10.9s, create ui: 0.1s).
webui-docker-auto-1 | Error completing request
webui-docker-auto-1 | Arguments: ('task(td9v3amy7jrkdya)', 'a delicious cheeseburger', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, '', False, False, 'positive', 'comma', 0, False, False, '', 1, '', 0, '', 0, '', True, False, False, False, 0) {}
webui-docker-auto-1 | Traceback (most recent call last):
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/call_queue.py", line 56, in f
webui-docker-auto-1 | res = list(func(*args, **kwargs))
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/call_queue.py", line 37, in f
webui-docker-auto-1 | res = func(*args, **kwargs)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
webui-docker-auto-1 | processed = process_images(p)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 486, in process_images
webui-docker-auto-1 | res = process_images_inner(p)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 625, in process_images_inner
webui-docker-auto-1 | uc = get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, p.steps, cached_uc)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/processing.py", line 570, in get_conds_with_caching
webui-docker-auto-1 | cache[1] = function(shared.sd_model, required_prompts, steps)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/prompt_parser.py", line 140, in get_learned_conditioning
webui-docker-auto-1 | conds = model.get_learned_conditioning(texts)
webui-docker-auto-1 | File "/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 669, in get_learned_conditioning
webui-docker-auto-1 | c = self.cond_stage_model(c)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
webui-docker-auto-1 | return forward_call(*input, **kwargs)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
webui-docker-auto-1 | z = self.process_tokens(tokens, multipliers)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
webui-docker-auto-1 | z = self.encode_with_transformers(tokens)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
webui-docker-auto-1 | outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1201, in _call_impl
webui-docker-auto-1 | result = hook(self, input)
webui-docker-auto-1 | File "/stable-diffusion-webui/modules/lowvram.py", line 35, in send_me_to_gpu
webui-docker-auto-1 | module.to(devices.device)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
webui-docker-auto-1 | return self._apply(convert)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
webui-docker-auto-1 | module._apply(fn)
webui-docker-auto-1 | [Previous line repeated 2 more times]
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
webui-docker-auto-1 | param_applied = fn(param)
webui-docker-auto-1 | File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
webui-docker-auto-1 | return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
webui-docker-auto-1 | RuntimeError: CUDA error: unspecified launch failure
webui-docker-auto-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
webui-docker-auto-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
the first error that you got was just a timeout because of wonky internet connection, if you try building again it should be fixed (hopefully). Please keep pytorch pinned, otherwise you would get a lot of unexpected errors.
The second error seems weird, what is the output of this command?
docker run --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
if you get the same error, then it is probably a problem with docker not being able to see your GPU.
Make sure you have nvidia container toolkit installed and working.
i think the issue might've been that i had nvidia-container-toolkit-base installed as well. I uninstalled both, reinstalled nvidia-container-toolkit, restarted, and i've got the test image generating successfully now. not sure if the issue was that package or that I just needed to restart. i'm only able to get docker to see my GPU when I run with sudo though which I'm not a huge fan of... anyway, looks like the issue was with me not realizing I'd skipped the pre-reqs on a too-fresh ubuntu re-install.
I've just had this issue too on Ubuntu 23.04. I fixed it by re-installing nvidia-container-toolkit!
Has this issue been opened before?
Describe the bug
first attempt at building. the
docker compose --profile download up --build
step worked fine. attempting to rundocker compose --profile auto up --build
resulted in the following error:Which UI
auto
Hardware / Software
Server: Docker Desktop 4.18.0 (104112) Engine: Version: 20.10.24 API version: 1.41 (minimum version 1.12) Go version: go1.19.7 Git commit: 5d6db84 Built: Tue Apr 4 18:18:42 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.18 GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640 runc: Version: 1.1.4 GitCommit: v1.1.4-0-g5fd4c4d docker-init: Version: 0.19.0 GitCommit: de40ad0