Support llama.cpp CUBlas

PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Apache License 2.0

19.91k stars 2.22k forks source link

Support llama.cpp CUBlas #242

Open bannsec opened 1 year ago

bannsec commented 1 year ago

Modern llama.cpp has the ability to offload layers to cuda via CUBLas. This has basically become my go-to for running models due to the efficiency and ease of use. It looks like localGPT doesn't support this, though it supports CPU and MPS.

imjwang commented 1 year ago

This is a really good point @bannsec! I added support for this in #253 if you could try that branch. You'll probably have to reinstall with

CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir

https://github.com/imjwang/localGPT/tree/ggml-gpu