Open vivekshinde27 opened 6 months ago
Install again with --verbose
and make sure CUDA Found
is in the output.
check your cuda installation in your env variables
and inside of the "path"
now open the cmake gui or just check if cmake gets all the system variables
be sure to have installed cudnn AND cuda in your windows.
_CUBLAS is replaced by llama_cuda so use this command insteed:
set "CMAKE_ARGS=-DLLAMA_CUDA=on" && set "FORCE_CMAKE=1" && pip install llama-cpp-python--force-reinstall --upgrade --no-cache-dir --verbose
please let me know if any of this solved your issue are you trying to run the [server] version or the normal?
I want to run my gguf model to use the GPU for inference, So for this I have done following things:
Whatever inferences it makes, it does on the CPU instead of GPU.
I want my model to do the inferences with GPU instead of CPU. I have 2 Nvidia RTX A5000 GPUs.
So Kindly guide why it is happening?
PS: I tried with Reinstalling CUDA Toolkit.
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 |