PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
19.93k stars 2.22k forks source link

cuda illegal memory access in run_localGPT.py but not localGPT_UI.py after solving pydantic.error_wrappers.ValidationError #500

Open rcantada opened 1 year ago

rcantada commented 1 year ago

Context

os: Ubuntu 22.04 cpu: AMD FX-8320E gpu: Quadro P4000 python environment: anaconda

Issue

When I run the ff. in a terminal:

python run_localGPT.py

Then ask a question I get the ff:

Enter a query: Who are you?

CUDA error 700 at /tmp/pip-install-0pktutdd/llama-cpp-python_f346c13605c14952878365efd2efe0b4/vendor/llama.cpp/ggml-cuda.cu:6540: an illegal memory access was encountered current device: 0 /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

Reducing the batch size to 8 in constants.py did not make a difference.

But I do not get any error when I run

streamlit run localGPT_UI.py

and ask my query via the web GUI.

How I solved the pydantic.error_wrappers.ValidationError

I got the following app exception as reported by others:

Traceback (most recent call last): File "/home/username/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/home/username/opt/localGPT/localGPT_UI.py", line 88, in <module> QA = RetrievalQA.from_chain_type( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 100, in from_chain_type combine_documents_chain = load_qa_chain( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 249, in load_qa_chain return loader_mapping[chain_type]( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 73, in _load_stuff_chain llm_chain = LLMChain( File "/home/username/.local/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in __init__ super().__init__(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)

Solution attempted

Ran the following in a terminal. Note that this is specific to my cpu that does not support AVX2

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX2=OFF -DLLAMA_F16C=ON" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

Another example from some other thread for E5645 (Westmere) Xeons

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

the generic recompilation of llama-cpp-python

CMAKE_ARGS="-DLLAMA_CUBLAS=1 FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

Check what flags your cpu support if you get the ff. after runing localGPT:

Illegal instruction (core dumped)

and turn off what is not supported

Note also that you need the latest llama-cpp-python (0.2.6 at the time of this post) that supports gguf, the old 0.1.78 version did not seem to support it.

rcantada commented 1 year ago

This error appears to be unrelated to the illegal memory access on python run_localGPT.py

/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

It went away after downgrading pyarrow from 12.0.1 to 11.0.0 pip install --upgrade --force-reinstall pyarrow==11.0.0

But the illegal memory access still happens using:

MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF" MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"

Using a GPTQ model does not cause illegal memory access. So this model runs fine in run_localGPT.py

MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ" MODEL_BASENAME = "model.safetensors"

I wonder what localGPT_UI.py is doing different with memory that allows it to use Llama-2-7b-Chat-GGUF but run_localGPT.py doesn't.