CUDA error 711 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6826: peer mapping resources exhausted

Describe the bug

Docker Compose, CUDA error 711 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6826: peer mapping resources exhausted

bug

Information about your version

tabbyml/tabby:latest    23bdb48b7956   2 weeks ago

Information about your GPU

cuda

Additional context

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    command: serve --model TabbyML/DeepseekCoder-6.7B --device cuda
    volumes:
      - "$HOME/.tabby:/data"
    environment:
      NVIDIA_VISIBLE_DEVICES: all
      HTTPS_PROXY: http://127.0.0.1:2081
      HTTP_PROXY: http://127.0.0.1:2081
    ports:
      - 9999:8080
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 9 # Setting it to 'all' or 10 will cause an error, otherwise everything will be normal.
              capabilities: [gpu]

TabbyML / tabby

CUDA error 711 at /root/workspace/crates/llama-cpp-bindings/llama.cpp/ggml-cuda.cu:6826: peer mapping resources exhausted #1162