Open FyzzLive opened 6 months ago
just add runtime: nvidia
to the docker-compose.yaml
Now getting services.runtime must be a mapping
#docker-compose.yaml
runtime: nvidia
environment:
# Don't enclose values in quotes
- DIRECT_ADDRESS=${DIRECT_ADDRESS:-127.0.0.1}
- DIRECT_ADDRESS_GET_WAN=${DIRECT_ADDRESS_GET_WAN:-false}
...
Any update on this?
Running docker compose up
from a freshly cloned repo and getting the same issue.
System: Linux: Ubuntu 20.04 Nvidia RTX 2070 - seems to spin up everything except for Automatic1111
Error when running compose:
supervisor-1 | ==> /var/log/supervisor/webui.log <==
supervisor-1 | INFO: Shutting down
supervisor-1 | INFO: Waiting for application shutdown.
supervisor-1 | INFO: Application shutdown complete.
supervisor-1 | INFO: Finished server process [407]
supervisor-1 | Starting A1111 SD Web UI...
supervisor-1 | Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
supervisor-1 | Version: v1.9.3
supervisor-1 | Commit hash: 1c0a0c4c26f78c32095ebc7f8af82f5c04fca8c0
supervisor-1 | 2024-04-23 11:34:28,438 INFO exited: webui (exit status 1; not expected)
supervisor-1 |
supervisor-1 | ==> /var/log/supervisor/supervisor.log <==
supervisor-1 | 2024-04-23 11:34:28,438 INFO exited: webui (exit status 1; not expected)
supervisor-1 |
supervisor-1 | ==> /var/log/supervisor/webui.log <==
supervisor-1 | Traceback (most recent call last):
supervisor-1 | File "/workspace/stable-diffusion-webui/launch.py", line 48, in <module>
supervisor-1 | main()
supervisor-1 | File "/workspace/stable-diffusion-webui/launch.py", line 39, in main
supervisor-1 | prepare_environment()
supervisor-1 | File "/workspace/stable-diffusion-webui/modules/launch_utils.py", line 386, in prepare_environment
supervisor-1 | raise RuntimeError(
supervisor-1 | RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
supervisor-1 | 2024-04-23 11:34:29,109 INFO spawned: 'webui' with pid 876
Changing the docker-compose.yml file and adding runtime: nvidia
seems to bomb too as docker doesn't seem to recognise this: Error response from daemon: unknown or invalid runtime name: nvidia
=====
This seems to be an issue with running Auto web UI as folks are talking about this there as well... Proposed solution here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/1742#issuecomment-1297210099
Although I tried adding this in the: build/COPY_ROOT/opt/ai-dock/bin/update-webui.sh
file, before Web UI is started, and it doesn't really do anything.
===
rmmod, lsmod, modprobe
aren't packaged with the linux/amd64 platform.
Jumped into the running container and nothing sitting in /sbin
.
So installed sudo apt install linux-generic -y
manually in the container(for testing purposes).. but rrmod nvidia-uvm
isn't possible because it's in use at that point.
======
I have downloaded and run a different tag as well: https://github.com/ai-dock/stable-diffusion-webui/pkgs/container/stable-diffusion-webui/205862819?tag=cuda-12.1.1-runtime-22.04-v1.8.0
Exact same behaviour unfortunately.
======
Last ditch attempt.. set AUTO_UPDATE=true
in the hope that something would have been fixed on the latest version of AUTOMATIC1111 .. ... nope :(
=======
Basically at a dead end unless I want to run this in "CPU mode" .. which I don't.
Per this Docker article about giving access to the GPU, I added the following to docker-compose.yaml
after getting the same error and things seem to be working now:
devices:
- "/dev/dri:/dev/dri"
# For AMD GPU
#- "/dev/kfd:/dev/kfd"
+
+ deploy:
+ resources:
+ reservations:
+ devices:
+ - driver: nvidia
+ count: 1
+ capabilities: [gpu]
volumes:
# Workspace
- ./workspace:${WORKSPACE:-/workspace/}:rshared
A question I have is why does it work in the cloud but not locally.
Why doesn't it need to have an explicit GPU configuration when in the cloud?
Per this Docker article about giving access to the GPU, I added the following to
docker-compose.yaml
after getting the same error and things seem to be working now:devices: - "/dev/dri:/dev/dri" # For AMD GPU #- "/dev/kfd:/dev/kfd" + + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: 1 + capabilities: [gpu] volumes: # Workspace - ./workspace:${WORKSPACE:-/workspace/}:rshared
The base image's docker-compose.yml
is perhaps worth looking at: https://github.com/ai-dock/base-image/blob/main/docker-compose.yaml
A question I have is why does it work in the cloud but not locally.
Why doesn't it need to have an explicit GPU configuration when in the cloud?
It really depends on the cloud provider. These images are particularly suited to providers where the user is given a single docker image to run in a preconfigured docker environment. In that case all you need to set is the envs and ports - Everything else is done for you.
The only downside of this, particularly if you subscribe to the 'docker way' of one process per container, is that there's no service orchestration so lots of services/apps get bundled on the one image but with supervisord managing rather than docker directly.
I will hopefully give more attention to local use soon when I am able to test in full virtual machine environments.
Running Locally : Torch is not able to use GPU Pulled the repo, edited the env, added my own provisioning script, and ran
docker compose up
Received this error after the build process as it was starting:dockerRuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check