Open erswelljustin opened 9 months ago
GGUF (Formerly GGML) is only for CPU. If you are using CUDA you need the GPTQ models
In my experience in Ubuntu 22.04, BLAS=0 happened when my build of llama-ccp failed to find my cuda-toolkit including cublas.h installation in an Anacoda environment. I had --verbose flag to see the logs.
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir --verbose
@erswelljustin As mentioned above, GGUF is a great option if you are running localGPT on Apple silicon or CPU. If you have access to an NVIDIA GPU, I would recommend to use GPTQ models. Also check if you have pytorch installed and have access to CUDA. In the same virtual evn, open python and run this code:
import torch print(torch.cuda.is_available())
This allowed me to use cpu and gpu simultaniously with GGUF, for windows :
set the environment variable properly : $Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" $Env:FORCE_CMAKE="1"
check that it works echo $Env:CMAKE_ARGS
uninstall previous version of llama-cpp-python pip uninstall llama-cpp-python
install the proper version : pip install llama-cpp-python==0.1.83 --no-cache-dir
@erswelljustin I would say, check your llama-cpp-python version.
Thanks all for your help I will report back
@PromtEngineer I am trying to use one of the Models as suggested in the constants.py
for GPTQ as per your reply. I have also checked that torch.cuda.is_available()
, which it is, however, I am getting an error that says:
FileNotFoundError: Could not find model in TheBloke/WizardLM-7B-uncensored-GPTQ
It is true that this isn't in the models folder but I felt sure the tutorial said that the model would be downloaded. I have uncommented lines 158 & 159 and commented out lines 98 & 99 of constants.py
and I am running python3 run_localGPT.py --device_type cuda --show_sources --use_history
@PromtEngineer I am trying to use one of the Models as suggested in the
constants.py
for GPTQ as per your reply. I have also checked thattorch.cuda.is_available()
, which it is, however, I am getting an error that says:
FileNotFoundError: Could not find model in TheBloke/WizardLM-7B-uncensored-GPTQ
It is true that this isn't in the models folder but I felt sure the tutorial said that the model would be downloaded. I have uncommented lines 158 & 159 and commented out lines 98 & 99 of
constants.py
and I am runningpython3 run_localGPT.py --device_type cuda --show_sources --use_history
I have updated the MODEL_BASMENAME to "model.safetensors"
and it is working now thanks for your help
For Windows, BLAS=0 if we keep the doble quotation marks on, It works with GPU and shows BLAS=1 if we use without double quotation marks. Below worked for me:
setx CMAKE_ARGS -DLLAMA_CUBLAS=on setx FORCE_CMAKE 1 pip install llama-cpp-python==0.1.83 --no-cache-dir
This allowed me to use cpu and gpu simultaniously with GGUF, for windows :
set the environment variable properly : $Env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" $Env:FORCE_CMAKE="1"
check that it works echo $Env:CMAKE_ARGS
uninstall previous version of llama-cpp-python pip uninstall llama-cpp-python
install the proper version : pip install llama-cpp-python==0.1.83 --no-cache-dir
@erswelljustin I would say, check your llama-cpp-python version.
This helped with my issue.
Hi @PromtEngineer
I have followed the README instructions and also watched your latest YouTube video, but even if I set the
--device_type
tocuda
manually when running therun_localGPT.py
orrun_localGPT_API
theBLAS
value is alwaus shown asBLAS = 0
I am running Ubuntu 22.04 and an NVidia RTX 4080. This is my
lspci
output for reference.I am using the following models in the
constants.py
Can you advise as currently it runs of of the CPU and ideally I'd like it to run off the very capable GPU.
Thanks!