GPU does not work on llamacpp(langchain) , please help.

Hi, I have an issue related to GPU acceleration. When I execute the following command, the GPU does not work on 2.

Command1 : ./main -m /MYPATH/ggml-model-q4_0.bin --color -p "MYQUESTION" -n 256 -ngl 45 --in-prefix

Result1 : blas=1 (80 tokens/s)

However, when I use the LlamaCpp model with GPU acceleration, it shows a lower speed.

Command2 : callback_manager = CallbackManager([StreamingStdOutCallbackHandler()]) llm = LlamaCpp( model_path="/MYPATH/ggml-model-q4_0.bin", n_gpu_layers=45, n_batch=512, max_length=1024, n_ctx=1024, callback_manager=callback_manager, verbose=True ) llm.predict("MYQUESTION")

Result2 : blas=0 (7.8 tokens/s)

I'm wondering if there might be any possible reasons for this discrepancy. Can you help me identify and correct the issue?

abetlen / llama-cpp-python