Closed imalros closed 2 months ago
@imalros I think you installed llama-cpp-python with llama-cpp-agent? Then you don't have the GPU acceleration. You have to reinstall llama-cpp-python. On Windows with Cuda, do it like that:
pip uninstall llama-cpp-python
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
In linux it should be something like that, I think:
pip uninstall llama-cpp-python
CMAKE_ARGS=-DLLAMA_CUBLAS=on
FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
@imalros I think I broke the Phi 3 template by accident. Will fix it now.
@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.
Thanks for the hint. I followed the steps you mentioned for linux but I got the same result. Here are the terminal outputs:
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ CMAKE_ARGS=-DLLAMA_CUBLAS=on
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ FORCE_CMAKE=1
(myenv) (base) user1@myserver:~/Code/llama-cpp-agent$ pip install llama-cpp-python --no-cache-dir
Collecting llama-cpp-python
Downloading llama_cpp_python-0.2.74.tar.gz (49.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.2/49.2 MB 26.7 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: typing-extensions>=4.5.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (4.9.0)
Requirement already satisfied: numpy>=1.20.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (1.26.4)
Requirement already satisfied: diskcache>=5.6.1 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (5.6.3)
Requirement already satisfied: jinja2>=2.11.3 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from llama-cpp-python) (3.1.3)
Requirement already satisfied: MarkupSafe>=2.0 in /home/user1/miniconda3/envs/myenv/lib/python3.11/site-packages (from jinja2>=2.11.3->llama-cpp-python) (2.1.5)
Building wheels for collected packages: llama-cpp-python
Building wheel for llama-cpp-python (pyproject.toml) ... done
Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.74-cp311-cp311-linux_x86_64.whl size=3657012 sha256=d57bf14f448439ea216ef0ee6450e4370c2eb59533cefa3a0b851a97d158c49b
Stored in directory: /tmp/pip-ephem-wheel-cache-8nehys8t/wheels/9b/ac/9a/7232ddf82e013b7234571c8ed5011125fb0ef4750d347306b8
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.2.74
@imalros If you install the new version of my framework 0.2.1, you will have the correct Phi 3 template.
Great! Thanks for the quick fix.
@imalros Maybe this will help you: https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation-configuration
I would try this CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
You are right, CMAKE_ARGS="-DLLAMA_CUDA=on"
is the correct cmake argument.
I came across this one-liner and it seems that it is fixed now:
CUDACXX=/usr/local/cuda-12.4/bin/nvcc CMAKE_ARGS="-DLLAMA_CUDA=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
I've been messing around with this repo since this morning, reading the readme files and digging into the code. I wanted to see how fast it runs, so I kicked off with
chatbot_using_llama_cpp_python.py
as my starting point. But for some reason, the model isn't loading into the GPU (because I can only seellm_load_tensors: CPU buffer size = 2281.66 MiB
and no CUDA line), even though I've gotn_gpu_layers=40
set up. I'd share the script here, but really, the only things I've changed are the model path and settingpredefined_messages_formatter_type
toMessagesFormatterType.PHI_3
.