abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.83k stars 934 forks source link

Even after installing CUDA + Pytorch + LLamaCPP Python I see BLAS = 0 always #1315

Open vivekshinde27 opened 6 months ago

vivekshinde27 commented 6 months ago

I want to run my gguf model to use the GPU for inference, So for this I have done following things:

  1. Installed Visual Studio Community Version 2022
  2. Installed Visual Studio Build Tools
  3. Installed CUDA Toolkit 12.1
  4. Installed CuDNN 12X
  5. Created New Conda Environment
  6. Installed Pytorch compatible with CUDA
  7. Checked if import torch >> torch.cuda.is_available() >> True
  8. Check if CUDA Path is available in system variables. >> Available
  9. I installed llama cpp python with following commands: set CMAKE_ARGS=-DLLAMA_CUBLAS=on set FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
  10. There is no error in Installation either.
  11. But When I run the model, it return BLAS = 0 in the console.

Whatever inferences it makes, it does on the CPU instead of GPU.

I want my model to do the inferences with GPU instead of CPU. I have 2 Nvidia RTX A5000 GPUs.

So Kindly guide why it is happening?

PS: I tried with Reinstalling CUDA Toolkit.

AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |

abetlen commented 6 months ago

Install again with --verbose and make sure CUDA Found is in the output.

Darthph0enix7 commented 5 months ago

check your cuda installation in your env variables image

and inside of the "path" image

now open the cmake gui or just check if cmake gets all the system variables

be sure to have installed cudnn AND cuda in your windows.

_CUBLAS is replaced by llama_cuda so use this command insteed:

set "CMAKE_ARGS=-DLLAMA_CUDA=on" && set "FORCE_CMAKE=1" && pip install llama-cpp-python--force-reinstall --upgrade --no-cache-dir --verbose

please let me know if any of this solved your issue are you trying to run the [server] version or the normal?