Even after installing CUDA + Pytorch + LLamaCPP Python I see BLAS = 0 always

vivekshinde27 commented 6 months ago

I want to run my gguf model to use the GPU for inference, So for this I have done following things:

Installed Visual Studio Community Version 2022
Installed Visual Studio Build Tools
Installed CUDA Toolkit 12.1
Installed CuDNN 12X
Created New Conda Environment
Installed Pytorch compatible with CUDA
Checked if import torch >> torch.cuda.is_available() >> True
Check if CUDA Path is available in system variables. >> Available
I installed llama cpp python with following commands: set CMAKE_ARGS=-DLLAMA_CUBLAS=on set FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
There is no error in Installation either.
But When I run the model, it return BLAS = 0 in the console.

Whatever inferences it makes, it does on the CPU instead of GPU.

I want my model to do the inferences with GPU instead of CPU. I have 2 Nvidia RTX A5000 GPUs.

So Kindly guide why it is happening?

PS: I tried with Reinstalling CUDA Toolkit.

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |

abetlen commented 6 months ago

Install again with --verbose and make sure CUDA Found is in the output.

Darthph0enix7 commented 5 months ago

check your cuda installation in your env variables

and inside of the "path"

now open the cmake gui or just check if cmake gets all the system variables

be sure to have installed cudnn AND cuda in your windows.

_CUBLAS is replaced by llama_cuda so use this command insteed:

set "CMAKE_ARGS=-DLLAMA_CUDA=on" && set "FORCE_CMAKE=1" && pip install llama-cpp-python--force-reinstall --upgrade --no-cache-dir --verbose

please let me know if any of this solved your issue are you trying to run the [server] version or the normal?

abetlen / llama-cpp-python

Even after installing CUDA + Pytorch + LLamaCPP Python I see BLAS = 0 always #1315