Open bannsec opened 1 year ago
This is a really good point @bannsec! I added support for this in #253 if you could try that branch. You'll probably have to reinstall with
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install -U llama-cpp-python --no-cache-dir
Modern llama.cpp has the ability to offload layers to cuda via CUBLas. This has basically become my go-to for running models due to the efficiency and ease of use. It looks like localGPT doesn't support this, though it supports CPU and MPS.