'Invalid device 0' etc. during inference inside gpu enabled docker image with nvidia container toolkit

Might be related to #1852. Taken inspiration from the dockerfile in .devops

I've been running in nvidia containers with same host environment, toolkit installed and running with the same main version on system as container.

Steps to reproduce below

Steps to reproduce:

By copying main-cuda.Dockerfile to the root on commit 13c2232 (HEAD -> master, origin/master, origin/HEAD) sync : ggml

path/to/whisper.cpp$ cp .devops/main-cuda.Dockerfile whisper-with-GPU.Dockerfile path/to/whisper.cpp$ docker build -f whisper-with-GPU.Dockerfile -t whisperGPU . path/to/whisper.cpp$ docker run -it --rm \ -v ~/Documents/Github/whisper_cpp/models:/models \ -v ~/Documents/Github/whisper.cpp/samples:/audios \ --gpus all whisperGPU "./main -m /models/ggml-small.bin -f /audios/jfk.wav"

Output from running nvidia-smi:

~/D/G/a/whisper.cpp►docker run --rm --gpus all nvidia/cuda:12.3.2-devel-ubuntu22.04 nvidia-smi                                                                                   (base) team-topheim-dj-9UqepM00-py3.11 (master|?) 16:00

==========
== CUDA ==
==========

CUDA Version 12.3.2

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Tue Apr  9 14:00:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   49C    P8             19W /  115W |     200MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
~/D/G/a/whisper.cpp►docker run --rm --gpus all nvidia/cuda:12.4.0-devel-ubuntu22.04 nvidia-smi                                                                            (base) team-topheim-dj-9UqepM00-py3.11 0.569s (master|?) 16:00

==========
== CUDA ==
==========

CUDA Version 12.4.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Tue Apr  9 14:00:23 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   49C    P8             19W /  115W |     200MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

I am having the same issue.

host prints nvcc --version as 12.0 host nvidia-smi prints

| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |

I have attempted to build the .devops/main-cuda.Dockerfile manually and I have tried with Cuda version 12.3.1, 12.3.0, and 12.0.0, and the Cuda main version set to respectively 12.3 and 12.0

It builds normally. When I run it, I tried:

docker run -it --rm --gpus "device=0" -v /home/simen/development/minutemind.co/dj/ai/whisper_cpp/models:/models w
hisper:latest "nvidia-smi"

It prints:

NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3

which in fact is spot on with host system.

Coming down to the point where I run the container workload with the example using the following command:

docker run -it --rm --gpus "device=0" -v /ai/whisper_cpp/models:/models whisper:latest "./main -m /models/ggml-large-v2.bin -f ./samples/jfk.wav"

It errors with:

whisper_backend_init: using CUDA backend
ggml_cuda_init: failed to initialize CUDA: system has unsupported display driver / cuda driver combination
ggml_backend_cuda_init: error: invalid device 0
whisper_backend_init: ggml_backend_cuda_init() failed
whisper_model_load:      CPU total size =  3093.99 MB
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init: using CUDA backend
ggml_backend_cuda_init: error: invalid device 0
whisper_backend_init: ggml_backend_cuda_init() failed

It is a bit weird, and possibly just user error, but any advice?

My system has both a RTX A2000 and RTX 3080 gpus.

I ran into the same problem. After a good bit of research I found that the main-cuda.Dockerfile has some issues. When compiling stuff with CUDA support you need to distinguish between the compile phase and the runtime phase: When you build the image with docker build ... without mapping a graphics card into the container the build should link against the CUDA library stubs (e.g. libcuda.so). When you run the resulting image with a graphics card mapped to the container using the nvidia container toolkit the toolkit will provide the host runtime's CUDA libraries inside the container. Thus, it is highly advised against linking your CUDA program against the compat libraries provided in the build container (doing so will prevent the resulting image from working when host and container do not have the exact same CUDA versions). For more info on this refer to this issue.

To make the image work I basically removed the LD_LIBRARY_PATH directives (they are not needed because the nvidia base image makes the libs available using ldconfig). However, you will also need to change the Makefile to not call the main program because this will fail if you do not map a graphics card during container build. Finally, the nvidia architecture is not passed correctly since it uses the wrong environment variable. To make this work you will need to set the variable CUDA_ARCH_FLAG.

For me it worked using the following Dockerfile:

ARG UBUNTU_VERSION=22.04
ARG CUDA_VERSION=12.3.1
ARG BASE_CUDA_DEV_CONTAINER=nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ARG BASE_CUDA_RUN_CONTAINER=nvidia/cuda:${CUDA_VERSION}-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_DEV_CONTAINER} AS build
WORKDIR /app

ENV CUDA_ARCH_FLAG=all
ENV WHISPER_CUDA=1

RUN apt-get update && \
    apt-get install -y build-essential git wget && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

ARG WHISPER_VSN=v1.5.5
RUN wget https://github.com/ggerganov/whisper.cpp/archive/refs/tags/${WHISPER_VSN}.tar.gz && \
    tar --extract --strip-components=1 --gunzip --file ${WHISPER_VSN}.tar.gz && \
    rm -f ${WHISPER_VSN}.tar.gz && \
    perl -i -pe 's/-I\$\(CUDA_PATH\)\/targets\/\$\(UNAME_M\)-linux\/include//' Makefile && \
    perl -i -pe 's/-L\$\(CUDA_PATH\)\/targets\/\$\(UNAME_M\)-linux\/lib//' Makefile && \
    perl -i -pe 's/\.\/main -h/ldd \.\/main/' Makefile && \
    rm -f samples/*.wav && \
    make -j server small medium

FROM ${BASE_CUDA_RUN_CONTAINER} AS runtime
WORKDIR /app

RUN apt-get update && \
    apt-get install -y curl ffmpeg && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*

COPY --from=build /app/main /app/main
COPY --from=build /app/models/ggml-small.bin /app/small
COPY --from=build /app/models/ggml-medium.bin /app/medium
COPY --from=build /app/server /app/server

EXPOSE 5000/tcp
ENTRYPOINT [ "/app/server", "--host", "0.0.0.0", "--port", "5000", "--language", "de"]

ggerganov / whisper.cpp

'Invalid device 0' etc. during inference inside gpu enabled docker image with nvidia container toolkit #2032