Install with pytorch own cudatoolkit?

I'm using a server with Ubuntu 20.04.6 LTS with a V100 GPU. I'm not admin, and I can't install cudatoolkit at system level. I installed pytorch (with conda), which uses its own cudatoolkit. I have no problem using hf models with Langchain HuggingFacePipeline, they use the GPU, but I have problem with llama-cpp-python. I did: CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python and it goes without error, but when I load the model, it doesn't say: found 1 CUDA devices. I see that BLAS =0 and it doesn't use the GPU. My impression is that it is not compiled using cuda. Should I set some more env variables when I install llama-cpp-python, so that it knowns that the cuda libraries are in /.conda/envs/llama/lib? I already tried, in the notebook: !export LLAMA_CPP_LIB=~/.conda/envs/llama/lib/libllama.so from llama_cpp import Llama but the GPU is still not used. This is why I think the problem is at install level, and not import level.

abetlen / llama-cpp-python

Install with pytorch own cudatoolkit? #720