getumbrel / llama-gpt

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!
https://apps.umbrel.com/app/llama-gpt
MIT License
10.82k stars 699 forks source link

Can't use GPU: could not select device driver "nvidia" with capabilities: [[gpu]] #150

Open ProgrammingLife opened 8 months ago

ProgrammingLife commented 8 months ago

I've successfully use my RTX 3080ti with Stable Diffusion, Fooocus, Stable Cascade and my system is ready to work with GPU. Arch Linux.

$ ./run.sh --model 7b --with-cuda
...
 ✔ Container llama-gpt-llama-gpt-ui-1             Recreated                                                                                                                      0.3s 
 ✔ Container llama-gpt-llama-gpt-api-cuda-ggml-1  Recreated                                                                                                                      0.3s 
Attaching to llama-gpt-api-cuda-ggml-1, llama-gpt-ui-1
llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-ui-1             | [INFO  wait]  docker-compose-wait 2.12.1
llama-gpt-ui-1             | [INFO  wait] ---------------------------
llama-gpt-ui-1             | [DEBUG wait] Starting with configuration:
llama-gpt-ui-1             | [DEBUG wait]  - Hosts to be waiting for: [llama-gpt-api-cuda-ggml:8000]
llama-gpt-ui-1             | [DEBUG wait]  - Paths to be waiting for: []
llama-gpt-ui-1             | [DEBUG wait]  - Timeout before failure: 3600 seconds 
llama-gpt-ui-1             | [DEBUG wait]  - TCP connection timeout before retry: 5 seconds 
llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time between retries: 1 seconds
llama-gpt-ui-1             | [DEBUG wait] --------------------------------------------------------
llama-gpt-ui-1             | [INFO  wait] Checking availability of host [llama-gpt-api-cuda-ggml:8000]
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
Gracefully stopping... (press Ctrl+C again to force)

$ nvidia-smi
Sat Feb 24 17:49:50 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   36C    P8              10W / 125W |     10MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2044      G   /usr/lib/Xorg                                 4MiB |
+---------------------------------------------------------------------------------------+

What should I check?

henryruhs commented 8 months ago
nvidia-ctk runtime configure
systemctl restart docker
Haui1112 commented 7 months ago

This worked for me, thanks! Now I'm at "forward compatibility was attempted on non supported HW"

llama-gpt-api-cuda-ggml-1 | llama-gpt-api-cuda-ggml-1 | /models/llama-2-7b-chat.bin model found. llama-gpt-api-cuda-ggml-1 | make: *** No rule to make target 'build'. Stop. llama-gpt-api-cuda-ggml-1 | Initializing server with: llama-gpt-api-cuda-ggml-1 | Batch size: 2096 llama-gpt-api-cuda-ggml-1 | Number of CPU threads: 24 llama-gpt-api-cuda-ggml-1 | Number of GPU layers: 10 llama-gpt-api-cuda-ggml-1 | Context window: 4096 llama-gpt-api-cuda-ggml-1 | CUDA error 804 at /tmp/pip-install-7rxfzzup/llama-cpp-python_c62cf07cbfa449a7b268f9102316d6db/vendor/llama.cpp/ggml-cuda.cu:4883: forward compatibility was attempted on non supported HW

Setup: System: Host: TowerPC Kernel: 6.1.0-18-amd64 arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.27.5 Distro: Debian GNU/Linux 12 (bookworm) Machine: Type: Desktop Mobo: ASUSTeK model: PRIME X299-A II v: Rev 1.xx serial: BIOS: American Megatrends v: 0702 date: 06/10/2020 CPU: Info: 12-core model: Intel Core i9-10920X bits: 64 type: MT MCP cache: L2: 12 MiB Speed (MHz): avg: 1201 min/max: 1200/4600:4800:4700 cores: 1: 1200 2: 1201 3: 1200 4: 1200 5: 1200 6: 1200 7: 1206 8: 1200 9: 1200 10: 1232 11: 1200 12: 1200 13: 1200 14: 1200 15: 1200 16: 1200 17: 1200 18: 1200 19: 1200 20: 1200 21: 1200 22: 1200 23: 1200 24: 1200 Graphics: Device-1: NVIDIA GA104 [GeForce RTX 3060 Ti Lite Hash Rate] driver: nvidia v: 525.147.05

kmanwar89 commented 7 months ago

I'm seeing this same error in a 1080 Ti, but when issuing nvidia-smi, the result is

❯ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.54

I followed the instructions from nVidia's CUDA page here to install the CUDA drivers, but I'm guessing there's a driver mismatch somewhere..

Edit Miraculously, a reboot didn't break my system and I now see similar output in the nvidia-smi command. However, I don't have an nvidia-ctk command that I can run, so I remain stuck(ish)

Edit 2 - A quick search indicated I needed to install the nVidia Container Toolkit, and restart docker. The errors have now gone away.

I was able to bring up the docker-compose-cuda-ggml.yml file using the command docker compose -f docker-compose-cuda-ggml.yml up -d - however, the other cuda compose (gguf) did not work for me