cuda illegal memory access in run_localGPT.py but not localGPT_UI.py after solving pydantic.error_wrappers.ValidationError

Context

os: Ubuntu 22.04 cpu: AMD FX-8320E gpu: Quadro P4000 python environment: anaconda

Issue

When I run the ff. in a terminal:

python run_localGPT.py

Then ask a question I get the ff:

Enter a query: Who are you?

CUDA error 700 at /tmp/pip-install-0pktutdd/llama-cpp-python_f346c13605c14952878365efd2efe0b4/vendor/llama.cpp/ggml-cuda.cu:6540: an illegal memory access was encountered current device: 0 /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

Reducing the batch size to 8 in constants.py did not make a difference.

But I do not get any error when I run

streamlit run localGPT_UI.py

and ask my query via the web GUI.

How I solved the pydantic.error_wrappers.ValidationError

I got the following app exception as reported by others:

Traceback (most recent call last): File "/home/username/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/home/username/opt/localGPT/localGPT_UI.py", line 88, in <module> QA = RetrievalQA.from_chain_type( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 100, in from_chain_type combine_documents_chain = load_qa_chain( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 249, in load_qa_chain return loader_mapping[chain_type]( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 73, in _load_stuff_chain llm_chain = LLMChain( File "/home/username/.local/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in __init__ super().__init__(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)

Solution attempted

Ran the following in a terminal. Note that this is specific to my cpu that does not support AVX2

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX2=OFF -DLLAMA_F16C=ON" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

Another example from some other thread for E5645 (Westmere) Xeons

CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

the generic recompilation of llama-cpp-python

CMAKE_ARGS="-DLLAMA_CUBLAS=1 FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose

Check what flags your cpu support if you get the ff. after runing localGPT:

Illegal instruction (core dumped)

and turn off what is not supported

Note also that you need the latest llama-cpp-python (0.2.6 at the time of this post) that supports gguf, the old 0.1.78 version did not seem to support it.

This error appears to be unrelated to the illegal memory access on python run_localGPT.py

/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit

It went away after downgrading pyarrow from 12.0.1 to 11.0.0 pip install --upgrade --force-reinstall pyarrow==11.0.0

But the illegal memory access still happens using:

MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF" MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"

Using a GPTQ model does not cause illegal memory access. So this model runs fine in run_localGPT.py

MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ" MODEL_BASENAME = "model.safetensors"

I wonder what localGPT_UI.py is doing different with memory that allows it to use Llama-2-7b-Chat-GGUF but run_localGPT.py doesn't.

PromtEngineer / localGPT