I cant run Llama() function on GPU

abetlen / llama-cpp-python

Python bindings for llama.cpp

https://llama-cpp-python.readthedocs.io

MIT License

8.06k stars 960 forks source link

I cant run Llama() function on GPU #1221

Open tomwarias opened 8 months ago

tomwarias commented 8 months ago

I am using Llama() function for chatbot in terminal but when i set n_gpu_layers=-1 or any other number it doesn't engage in computation. In comparison when i set it on Lm Studio it works perfectly and fast I want the same thing but in terminal. Any one knows what could be the problem?

model = Llama(model_path= "zephyr-7b-beta.Q4_K_M.gguf", n_gpu_layers=-1, n_ctx=2048,verbose=False)

AmarTrivedi1 commented 8 months ago

Did you install with GPU support or just the basic CPU?

If you run it without the n_gpu_layers argument, does it work? And in the task manager you will see your RAM and CPU being utilized?

Garstig commented 8 months ago

What system are you using?

I had similar issues on WSL with Ubuntu. The problem was, that the lib did not find the nvidia toolkit.

The Ubuntu solution mentioned here worked for me. They also have a solution for windows, but I did not test this.

tomwarias commented 8 months ago

Yes I have Cuda 11.8. In task manager I see only usage of RAM and CPU with or without n_gpy_layers variable. I use WSL system

hockeybro12 commented 8 months ago

I'm also having issues with latest version, 0.2.33 works fine.

freckletonj commented 5 months ago

Same. There seem to be several threads that address this, no solutions yet: https://github.com/abetlen/llama-cpp-python/issues/1123#issuecomment-2153405068 https://github.com/abetlen/llama-cpp-python/issues/1310#issuecomment-2153424941