abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.12k stars 966 forks source link

GPU does not work on llamacpp(langchain) , please help. #977

Open 51-matt opened 11 months ago

51-matt commented 11 months ago

Hi, I have an issue related to GPU acceleration. When I execute the following command, the GPU does not work on 2.

Command1 : ./main -m /MYPATH/ggml-model-q4_0.bin --color -p "MYQUESTION" -n 256 -ngl 45 --in-prefix

Result1 : blas=1 (80 tokens/s)

However, when I use the LlamaCpp model with GPU acceleration, it shows a lower speed.

Command2 : callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) llm = LlamaCpp( model_path="/MYPATH/ggml-model-q4_0.bin", n_gpu_layers=45, n_batch=512, max_length=1024, n_ctx=1024, callback_manager=callback_manager, verbose=True ) llm.predict("MYQUESTION")

Result2 : blas=0 (7.8 tokens/s)

I'm wondering if there might be any possible reasons for this discrepancy. Can you help me identify and correct the issue?

gcapozzo commented 10 months ago

check https://github.com/abetlen/llama-cpp-python/issues/695#issuecomment-1869176032 I had the same problem llama-cpp-python dont use gpu, you can check if used with nvtop