Open rcantada opened 1 year ago
This error appears to be unrelated to the illegal memory access on python run_localGPT.py
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
It went away after downgrading pyarrow from 12.0.1 to 11.0.0
pip install --upgrade --force-reinstall pyarrow==11.0.0
But the illegal memory access still happens using:
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF" MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
Using a GPTQ model does not cause illegal memory access. So this model runs fine in run_localGPT.py
MODEL_ID = "TheBloke/Wizard-Vicuna-7B-Uncensored-GPTQ" MODEL_BASENAME = "model.safetensors"
I wonder what localGPT_UI.py is doing different with memory that allows it to use Llama-2-7b-Chat-GGUF but run_localGPT.py doesn't.
Context
os: Ubuntu 22.04 cpu: AMD FX-8320E gpu: Quadro P4000 python environment: anaconda
Issue
When I run the ff. in a terminal:
python run_localGPT.py
Then ask a question I get the ff:
Enter a query: Who are you?
CUDA error 700 at /tmp/pip-install-0pktutdd/llama-cpp-python_f346c13605c14952878365efd2efe0b4/vendor/llama.cpp/ggml-cuda.cu:6540: an illegal memory access was encountered current device: 0 /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit
Reducing the batch size to 8 in constants.py did not make a difference.
But I do not get any error when I run
streamlit run localGPT_UI.py
and ask my query via the web GUI.
How I solved the pydantic.error_wrappers.ValidationError
I got the following app exception as reported by others:
Traceback (most recent call last): File "/home/username/.local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script exec(code, module.__dict__) File "/home/username/opt/localGPT/localGPT_UI.py", line 88, in <module> QA = RetrievalQA.from_chain_type( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 100, in from_chain_type combine_documents_chain = load_qa_chain( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 249, in load_qa_chain return loader_mapping[chain_type]( File "/home/username/.local/lib/python3.10/site-packages/langchain/chains/question_answering/__init__.py", line 73, in _load_stuff_chain llm_chain = LLMChain( File "/home/username/.local/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in __init__ super().__init__(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)
Solution attempted
Ran the following in a terminal. Note that this is specific to my cpu that does not support AVX2
CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX2=OFF -DLLAMA_F16C=ON" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose
Another example from some other thread for E5645 (Westmere) Xeons
CMAKE_ARGS="-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose
the generic recompilation of llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=1 FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose
Check what flags your cpu support if you get the ff. after runing localGPT:
Illegal instruction (core dumped)
and turn off what is not supported
Note also that you need the latest llama-cpp-python (0.2.6 at the time of this post) that supports gguf, the old 0.1.78 version did not seem to support it.