Cannot load model on the GPU with llama-cpp-python (Windows)

MeetKai / functionary

Chat language model that can use tools and interpret the results

MIT License

1.44k stars 114 forks source link

Cannot load model on the GPU with llama-cpp-python (Windows) #130

Open VicKov1975 opened 8 months ago

VicKov1975 commented 8 months ago

This is how I am loading the model using Python, but it uses only the CPU:

Llama(model_path="./functionary-7b-v2.q4_0.gguf", n_ctx=4096, n_gpu_layers=50)

I have also tried to re-install llama-cpp-python using the instructions below but that didn't help:

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" set FORCE_CMAKE=1 pip install --upgrade --verbose --force-reinstall llama-cpp-python --no-cache-dir

My GPU has only 8GB of VRAM, could that be the reason? I saw in the readme that this model requires 24GB of VRAM... However, other models such as Mistral are loading on my GPU just fine. So I am assuming that my Cuda installation is correct.

jeffrey-fong commented 8 months ago

Hi, we have recently integrated our models into llama-cpp-python directly. Here's how you can use it. Can you try it and see if it works now?

I tested it on my end using the following code and the model loads using 4.835GB GPU VRAM.

llm = Llama.from_pretrained(
  repo_id="meetkai/functionary-7b-v2-GGUF",
  filename="functionary-7b-v2.q4_0.gguf",
  chat_format="functionary-v2",
  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-7b-v2-GGUF"),
  n_ctx=4096,
  n_gpu_layers=-1,
)

VicKov1975 commented 8 months ago

Hi, we have recently integrated our models into llama-cpp-python directly. Here's how you can use it. Can you try it and see if it works now?

I tested it on my end using the following code and the model loads using 4.835GB GPU VRAM.
llm = Llama.from_pretrained(
  repo_id="meetkai/functionary-7b-v2-GGUF",
  filename="functionary-7b-v2.q4_0.gguf",
  chat_format="functionary-v2",
  tokenizer=LlamaHFTokenizer.from_pretrained("meetkai/functionary-7b-v2-GGUF"),
  n_ctx=4096,
  n_gpu_layers=-1,
)

Yes it works. Quick question: is there a way to load a local GGUF file instead of downloading it from the hub?

jeffreymeetkai commented 7 months ago

Sorry for being so late but yes, you can load a local GGUF file by just initializing the Llama class directly. Here's a guide showing how.