abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.92k stars 944 forks source link

CUDA error 716 running the cuda_simple image #919

Closed lmorandini closed 11 months ago

lmorandini commented 11 months ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

The Docker container cuda_simple runs in does not crash when a request is sent.

Current Behavior

Whenever a request is sent, the program crashes with this message:

CUDA error 716 at /tmp/pip-install-wor20xk7/llama-cpp-python_3077f152adad4f479ee5f8ba791fa89a/vendor/llama.cpp/ggml-cuda.cu:7104: misaligned address
current device: 0

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100D-80C On | 00000000:00:07.0 Off | 0 | | N/A N/A P0 N/A / N/A | 47880MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 244536 C python3 47879MiB | +-----------------------------------------------------------------------------+


* Operating System, e.g. for Linux:
For the VM:

Linux llamab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux


* SDK version, e.g. for Linux:
The image was built using the Dockerfile under cuda_simple has the following characteristics:

Linux 36009f992dd3 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100D-80C On | 00000000:00:07.0 Off | 0 | | N/A N/A P0 N/A / N/A | 47880MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+


The Docker container was started within a Docker compose file:

model: image: cuda_simple:${MODEL_VERSION} deploy: replicas: ${N_MODEL_REPLICAS} resources: reservations: devices:

Failure Information (for bugs)

It looks like a bug. It used to work up until some days ago, but on a different VM, hence it may be related to some subtle environment changes.

Steps to Reproduce

The image was built from the current state of the main branch (commit# `96a377648c97113f443cafd41b6b9ae7f0e4e5ef``) using the provided Dockerfile.

Failure Logs

CUDA error 716 at /tmp/pip-install-wor20xk7/llama-cpp-python_3077f152adad4f479ee5f8ba791fa89a/vendor/llama.cpp/ggml-cuda.cu:7104: misaligned address
current device: 0
lmorandini commented 11 months ago

It turned out that there is was a bug in llama.cpp that was fixed yesterday in llama_cpp_python.

Since the change has not been versioned yet, the Dockerfile has to be changed to do a development build:

# Install llama-cpp-python (build with cuda)
RUN git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
WORKDIR llama-cpp-python
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on"  pip install -e .[server]