Maximilian-Winter / llama-cpp-agent

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls and get structured output. Works also with models not fine-tuned to JSON output and function calls.
Other
445 stars 38 forks source link

Using 01_Basics example, the model is not loading in GPU #57

Closed imalros closed 2 months ago

imalros commented 2 months ago

I've been messing around with this repo since this morning, reading the readme files and digging into the code. I wanted to see how fast it runs, so I kicked off with chatbot_using_llama_cpp_python.py as my starting point. But for some reason, the model isn't loading into the GPU (because I can only see llm_load_tensors: CPU buffer size = 2281.66 MiB and no CUDA line), even though I've got n_gpu_layers=40 set up. I'd share the script here, but really, the only things I've changed are the model path and setting predefined_messages_formatter_type to MessagesFormatterType.PHI_3.

Maximilian-Winter commented 2 months ago

@imalros I think you installed llama-cpp-python with llama-cpp-agent? Then you don't have the GPU acceleration. You have to reinstall llama-cpp-python. On Windows with Cuda, do it like that:

pip uninstall llama-cpp-python
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

In linux it should be something like that, I think:

pip uninstall llama-cpp-python
CMAKE_ARGS=-DLLAMA_CUBLAS=on
FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
Maximilian-Winter commented 2 months ago

@imalros I think I broke the Phi 3 template by accident. Will fix it now.

Maximilian-Winter commented 2 months ago

@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.

imalros commented 2 months ago

Thanks for the hint. I followed the steps you mentioned for linux but I got the same result. Here are the terminal outputs:

(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ CMAKE_ARGS=-DLLAMA_CUBLAS=on
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ FORCE_CMAKE=1
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ pip install llama-cpp-python --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.74.tar.gz (49.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 26.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (4.9.0)
Requirement already satisfied: numpy>=1.20.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (1.26.4)
Requirement already satisfied: diskcache>=5.6.1 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (5.6.3)
Requirement already satisfied: jinja2>=2.11.3 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from jinja2>=2.11.3->llama-cpp-python) (2.1.5)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.74-cp311-cp311-linux_x86_64.whl size=3657012 sha256=d57bf14f448439ea216ef0ee6450e4370c2eb59533cefa3a0b851a97d158c49b
  Stored in directory: /tmp/pip-ephem-wheel-cache-8nehys8t/wheels/9b/ac/9a/7232ddf82e013b7234571c8ed5011125fb0ef4750d347306b8
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.2.74
imalros commented 2 months ago

@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.

Great! Thanks for the quick fix.

Maximilian-Winter commented 2 months ago

@imalros Maybe this will help you: https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation-configuration

Maximilian-Winter commented 2 months ago

I would try this CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python

imalros commented 2 months ago

You are right, CMAKE_ARGS="-DLLAMA_CUDA=on" is the correct cmake argument. I came across this one-liner and it seems that it is fixed now: CUDACXX=/usr/local/cuda-12.4/bin/nvcc CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade