abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.22k stars 981 forks source link

cuBLAS error 15 #941

Open harrypale opened 1 year ago

harrypale commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

I am trying to run privateGPT with CUDA. Without CUDA it works. With CUDA enabled, 'llama-cpp-python' crashes.

Current Behavior

When I start privateGPT, llama-cpp-python-main gives this error : cuBLAS error 15 at C:\Users\Harry\Documents\llama-cpp-python-main\vendor\llama.cpp\ggml-cuda.cu:7586: the requested functionality is not supported

Environment and Context

I am on Windows 11, with NVIDIA CUDA Toolkit 12.2 installed, NVIDIA GeForce GTX 960 4GB and 32GB RAM. CPU is Intel i7 2600K (it should support AVX).

$ python3 --version
$ Python 3.11.6

$ make --version
$ GNU Make 4.4.1

$ g++ --version
$ g++ (x86_64-posix-seh-rev0, Built by MinGW-Builds project) 13.2.0

Failure Information (for bugs)

cuBLAS error 15 at C:\Users\Harry\Documents\llama-cpp-python-main\vendor\llama.cpp\ggml-cuda.cu:7586: the requested functionality is not supported

Steps to Reproduce

git clone https://github.com/imartinez/privateGPT
cd privateGPT

conda create -n privateGPT python=3.11
conda activate privateGPT

poetry install --with ui,local

poetry run python scripts/setup

cd ..

git clone --recurse-submodules https://github.com/abetlen/llama-cpp-python.git llama-cpp-python-main
cd llama-cpp-python-main

set FORCE_CMAKE=1
set CMAKE_ARGS=-DLLAMA_CUBLAS=on -DLLAMA_AVX=on -DLLAMA_AVX2=off -DLLAMA_FMA=off
python -m pip install .[all]

cd ..
cd privateGPT
make run

Let me know if I can give you more info, thank you.

abetlen commented 1 year ago

@harrypale can you try to install llama.cpp using CMAKE with the same setup and post those build logs, may be able to help.

blanks-hub commented 1 year ago

I ran into the same problem with almost the same setup (I'm running a GTX 970 4GB):

cuBLAS error 15 at C:\Users\$USER\AppData\Local\Temp\pip-install-odq455g9\llama-cpp-python_6254bfd1892540b2aada837341f178f9\vendor\llama.cpp\ggml-cuda.cu:7594: the requested functionality is not supported.

My install starts without a problem but when I try to prompt privateGPT it crashes and produces this error. Help is greatly appreciated!

UPDATE: A friend of mine found the source of the problem. Our GPU doesn't support half-precision floating point operations which is required by llama. Our hardware supported feature version is 5.2 but privateGPT was tested on 8.6: nvidia I guess this means it's over with GPU support for us :(