TabbyML / tabby

Self-hosted AI coding assistant
https://tabby.tabbyml.com/
Other
21.32k stars 959 forks source link

CUDA error 711 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6826: peer mapping resources exhausted #1162

Open QIN2DIM opened 9 months ago

QIN2DIM commented 9 months ago

Describe the bug

Docker Compose, CUDA error 711 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6826: peer mapping resources exhausted

bug

Information about your version

tabbyml/tabby:latest    23bdb48b7956   2 weeks ago     

Information about your GPU

cuda

Additional context

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model TabbyML/DeepseekCoder-6.7B --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    environment:
      NVIDIA_VISIBLE_DEVICES: all
      HTTPS_PROXY: http://127.0.0.1:2081
      HTTP_PROXY: http://127.0.0.1:2081
    ports:
      - 9999:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 9 # Setting it to 'all' or 10 will cause an error, otherwise everything will be normal.
              capabilities: [gpu]
wsxiaoys commented 9 months ago

Since tabby only utilize single GPU - could try passing a single gpu device (e.g device 0) to docker container and try if it work?