Using 01_Basics example, the model is not loading in GPU

imalros commented 2 months ago

I've been messing around with this repo since this morning, reading the readme files and digging into the code. I wanted to see how fast it runs, so I kicked off with chatbot_using_llama_cpp_python.py as my starting point. But for some reason, the model isn't loading into the GPU (because I can only see llm_load_tensors: CPU buffer size = 2281.66 MiB and no CUDA line), even though I've got n_gpu_layers=40 set up. I'd share the script here, but really, the only things I've changed are the model path and setting predefined_messages_formatter_type to MessagesFormatterType.PHI_3.

Maximilian-Winter commented 2 months ago

@imalros I think you installed llama-cpp-python with llama-cpp-agent? Then you don't have the GPU acceleration. You have to reinstall llama-cpp-python. On Windows with Cuda, do it like that:

pip uninstall llama-cpp-python
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

In linux it should be something like that, I think:

pip uninstall llama-cpp-python
CMAKE_ARGS=-DLLAMA_CUBLAS=on
FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir

Maximilian-Winter commented 2 months ago

@imalros I think I broke the Phi 3 template by accident. Will fix it now.

Maximilian-Winter commented 2 months ago

@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.

imalros commented 2 months ago

Thanks for the hint. I followed the steps you mentioned for linux but I got the same result. Here are the terminal outputs:

(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ CMAKE_ARGS=-DLLAMA_CUBLAS=on
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ FORCE_CMAKE=1
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ pip install llama-cpp-python --no-cache-dir
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.74.tar.gz (49.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 26.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (4.9.0)
Requirement already satisfied: numpy>=1.20.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (1.26.4)
Requirement already satisfied: diskcache>=5.6.1 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (5.6.3)
Requirement already satisfied: jinja2>=2.11.3 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from jinja2>=2.11.3->llama-cpp-python) (2.1.5)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... done
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.74-cp311-cp311-linux_x86_64.whl size=3657012 sha256=d57bf14f448439ea216ef0ee6450e4370c2eb59533cefa3a0b851a97d158c49b
  Stored in directory: /tmp/pip-ephem-wheel-cache-8nehys8t/wheels/9b/ac/9a/7232ddf82e013b7234571c8ed5011125fb0ef4750d347306b8
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.2.74

imalros commented 2 months ago

@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.

Great! Thanks for the quick fix.

Maximilian-Winter commented 2 months ago

@imalros Maybe this will help you: https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation-configuration

Maximilian-Winter commented 2 months ago

I would try this CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python

imalros commented 2 months ago

You are right, CMAKE_ARGS="-DLLAMA_CUDA=on" is the correct cmake argument. I came across this one-liner and it seems that it is fixed now: CUDACXX=/usr/local/cuda-12.4/bin/nvcc CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Maximilian-Winter / llama-cpp-agent

Using 01_Basics example, the model is not loading in GPU #57