Open tomwarias opened 8 months ago
Did you install with GPU support or just the basic CPU?
If you run it without the n_gpu_layers argument, does it work? And in the task manager you will see your RAM and CPU being utilized?
What system are you using?
I had similar issues on WSL with Ubuntu. The problem was, that the lib did not find the nvidia toolkit.
The Ubuntu solution mentioned here worked for me. They also have a solution for windows, but I did not test this.
Yes I have Cuda 11.8. In task manager I see only usage of RAM and CPU with or without n_gpy_layers variable. I use WSL system
I'm also having issues with latest version, 0.2.33 works fine.
Same. There seem to be several threads that address this, no solutions yet: https://github.com/abetlen/llama-cpp-python/issues/1123#issuecomment-2153405068 https://github.com/abetlen/llama-cpp-python/issues/1310#issuecomment-2153424941
I am using Llama() function for chatbot in terminal but when i set n_gpu_layers=-1 or any other number it doesn't engage in computation. In comparison when i set it on Lm Studio it works perfectly and fast I want the same thing but in terminal. Any one knows what could be the problem?
model = Llama(model_path= "zephyr-7b-beta.Q4_K_M.gguf", n_gpu_layers=-1, n_ctx=2048,verbose=False)