Closed lmorandini closed 11 months ago
It turned out that there is was a bug in llama.cpp that was fixed yesterday in llama_cpp_python.
Since the change has not been versioned yet, the Dockerfile has to be changed to do a development build:
# Install llama-cpp-python (build with cuda)
RUN git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git
WORKDIR llama-cpp-python
RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install -e .[server]
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
The Docker container cuda_simple runs in does not crash when a request is sent.
Current Behavior
Whenever a request is sent, the program crashes with this message:
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100D-80C On | 00000000:00:07.0 Off | 0 | | N/A N/A P0 N/A / N/A | 47880MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 244536 C python3 47879MiB | +-----------------------------------------------------------------------------+
Linux llamab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Linux 36009f992dd3 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100D-80C On | 00000000:00:07.0 Off | 0 | | N/A N/A P0 N/A / N/A | 47880MiB / 81920MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
model: image: cuda_simple:${MODEL_VERSION} deploy: replicas: ${N_MODEL_REPLICAS} resources: reservations: devices:
Necessary due to CUD 12.0 on the server but 12.1 requested (avoids checking CUDA version at start)
Failure Information (for bugs)
It looks like a bug. It used to work up until some days ago, but on a different VM, hence it may be related to some subtle environment changes.
Steps to Reproduce
The image was built from the current state of the
main
branch (commit# `96a377648c97113f443cafd41b6b9ae7f0e4e5ef``) using the provided Dockerfile.Failure Logs