abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.82k stars 933 forks source link

Install with pytorch own cudatoolkit? #720

Open fcivardi opened 1 year ago

fcivardi commented 1 year ago

I'm using a server with Ubuntu 20.04.6 LTS with a V100 GPU. I'm not admin, and I can't install cudatoolkit at system level. I installed pytorch (with conda), which uses its own cudatoolkit. I have no problem using hf models with Langchain HuggingFacePipeline, they use the GPU, but I have problem with llama-cpp-python. I did: CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python and it goes without error, but when I load the model, it doesn't say: found 1 CUDA devices. I see that BLAS =0 and it doesn't use the GPU. My impression is that it is not compiled using cuda. Should I set some more env variables when I install llama-cpp-python, so that it knowns that the cuda libraries are in /.conda/envs/llama/lib? I already tried, in the notebook: !export LLAMA_CPP_LIB=~/.conda/envs/llama/lib/libllama.so from llama_cpp import Llama but the GPU is still not used. This is why I think the problem is at install level, and not import level.

abetlen commented 1 year ago

Hey @fcivardi I'm not too familiar with the conda cuda packages but is it possible set your CUDA_HOME and CUDA_TOOLKIT_ROOT_DIR environment variables to point to them before doing the installation. You'll likely need to reinstall with --force-reinstall --no-cache-dir --verbose and make sure there's a CUDA Found or CUBLAS Found in the cmake logs.