LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.26k stars 360 forks source link

Docker container CUDA not working #1015

Closed Firestorm7893 closed 3 months ago

Firestorm7893 commented 3 months ago

Describe the Issue I cannot get the docker container of koboldcpp to detect my cuda installation on the host Eg. nvidia-smi returns always command not found :(.

I'm trying to set everything up trough docker compose. I know my docker setup works since I have other containers that use the gpus with no problems.

this is the compose for koboldcpp

  llamacpp:
    container_name: koboldcpp
    image: koboldai/koboldcpp:latest
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    volumes:
      - /home/docker/services/ollama/llamacpp:/app/models
      -  /home/docker/services/ollama/llamacpp/koboldcpp:/koboldcpp
    networks:
      - newNetwork
    environment:
      - KCPP_ARGS=--port="80" --model="/app/models/LLaMA2-13B-Tiefighter.Q4_K_S.gguf" --usecublas="normal "
    labels:
      traefik.http.routers.llm.rule: Host(`llm.silvaserv.it`)
      traefik.http.routers.llm.tls: true
      traefik.http.routers.llm.tls.certresolver: lets-encrypt
      traefik.port: 80
      traefik.docker.network: newNetwork
      traefik.enable: true

this is a compose file for another container where the gpu works:

  jellyfin:
    image: jellyfin/jellyfin
    container_name: jellyfin
    runtime: nvidia
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Rome
#      - NVIDIA_VISIBLE_DEVICES=ALL
#      - NVIDIA_CAPABILITIES=ALL
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    healthcheck:
      disable: true

Additional Information: System specs:

OS: Fedora Linux 40 (KDE Plasma) x86_64
Kernel: Linux 6.9.9-200.fc40.x86_64
DE: KDE Plasma 6.1.3
WM: KWin (Wayland)
CPU: Intel(R) Core(TM) i9-10980XE (36) @ 4.80 GHz
GPU: NVIDIA RTX A6000
Memory: 25.55 GiB / 62.58 GiB (41%)
Swap: 155.00 MiB / 125.15 GiB (0%)
Disk (/): 1.14 TiB / 1.82 TiB (63%) - btrfs
Disk (/mnt/Vault): 1.06 TiB / 1.79 TiB (59%) - ext4
Disk (/mnt/Vault2): 1.18 TiB / 1.79 TiB (66%) - ext4
Disk (/mnt/Windows): 265.73 GiB / 953.06 GiB (28%) - fuseblk

Should I build my own docker image using nvidia's cuda images as a base?

LostRuins commented 3 months ago

The dockers are maintained by @henk717 so they might have to take a look. They are conda based I believe.

henk717 commented 3 months ago

It is certainly going to be your docker setup, the bigger question is why. The KCPP_ARGS line looks wrong to me so that may have something to do with it. I expect something like KCPP_ARGS=--usecublas --gpulayers 99 --model locationofgguf. Defining them seperately isn't what I am doing myself on the templates for the providers and not a tested method.

If nvidia-smi is missing this is a docker side issue. The template doesn't ship with any cuda related driver files on purpose. In a healthy docker setup such as the one on runpod, and docker-desktop it automatically injects the correct driver including nvidia-smi when you passtrough the GPU.

So its possible you dont have the correct GPU passtrough method I know there are two competing ones for Linux and I dont know if both do it. To my knowledge the correct one is not the seperate nvidia docker runtime but the newer one that integrates into the official docker.

Something like this should have nvidia-smi in a healthy docker environment docker run --gpus all -it debian bash that one uses the modern gpus method rather than your old nvidia runtime method. If that works and you get nvidia-smi inside the debian docker our docker is based off your setup is correct. If so we can assume the docker-compose is wrong. Removing runtime=nvidia might be enough to get it to stop using the old depreciated nvidia runtime.

henk717 commented 3 months ago

Update: This is the docker-compose that works for me:

version: "3.2"
services:
  koboldcpp:
    container_name: koboldcpp
    image: koboldai/koboldcpp:latest
    volumes:
      - ./:/content/:ro
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    environment:
      - KCPP_ARGS=--model /content/model.gguf --usecublas --gpulayers 99 --multiuser 20 --quiet
    ports:
      - "5001:5001"
Firestorm7893 commented 3 months ago

with this configuration works! (maybe we could add this example to the docs)

Thanks!

henk717 commented 3 months ago

Already added it as an integrated example, if the docker is launched without any environment variables it now comes up with instructions. Those refer to docker run --rm -it koboldai/koboldcpp compose-example .