chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.67k stars 5.52k forks source link

[BUG] 在知识库上传pdf时读取文件失败 #4659

Open GreenyGo1 opened 2 months ago

GreenyGo1 commented 2 months ago

ERROR | chatchat.server.knowledge_base.utils:files2docs_in_thread_file2docs:419 - TypeError: 从文件 XXX.pdf 加载文档时出错:Cannot handle this data type: (1, 1, 1), |u1

GreenyGo1 commented 2 months ago

请问这是什么问题,准备的文件大多数都有这个问题

LonsonZheng commented 2 months ago

我上传csv文件也是一样的报错,你的问题解决了吗?是不是文件大小的原因?

HuoShengLiangIT commented 2 months ago

我的pdf文件报 ERROR | chatchat.server.knowledge_base.utils:files2docs_in_thread_file2docs:419 - TypeError: 从文件 xxx.pdf 加载文档时出错:Cannot handle this data type: (1, 1, 1), |u1,感觉是pdf文件中有不规范的文本或者图片,有的pdf可以,有的就不行

Zhuytt20 commented 3 weeks ago

我的也是 上传pdf报错 INFO: 127.0.0.1:50442 - "GET /tools HTTP/1.1" 200 OK 2024-09-29 19:39:35,001 httpx 34944 INFO HTTP Request: GET http://127.0.0.1:7861/tools "HTTP/1.1 200 OK" INFO: 127.0.0.1:41042 - "GET /knowledge_base/list_knowledge_bases HTTP/1.1" 200 OK 2024-09-29 19:40:45,269 httpx 34944 INFO HTTP Request: GET http://127.0.0.1:7861/knowledge_base/list_knowledge_bases "HTTP/1.1 200 OK" 2024-09-29 19:40:45.271 Uncaught app exception Traceback (most recent call last): File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 600, in _run_script exec(code, module.dict) File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/chatchat/webui.py", line 71, in kb_chat(api=api) File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/chatchat/webui_pages/kb_chat.py", line 119, in kb_chat selected_kb = st.selectbox( ^^^^^^^^^^^^^ File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/runtime/metrics_util.py", line 397, in wrapped_func result = non_optional_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/elements/widgets/selectbox.py", line 203, in selectbox return self._selectbox( ^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/elements/widgets/selectbox.py", line 305, in _selectbox serialized_value = serde.serialize(widgetstate.value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/elements/widgets/selectbox.py", line 66, in serialize return index(self.options, v) ^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/myenv/lib/python3.11/site-packages/streamlit/util.py", line 155, in index_ raise ValueError(f"{str(x)} is not in iterable") ValueError: samples is not in iterable INFO: 127.0.0.1:41058 - "GET /knowledge_base/list_knowledge_bases HTTP/1.1" 200 OK 2024-09-29 19:40:45,807 httpx 34944 INFO HTTP Request: GET http://127.0.0.1:7861/knowledge_base/list_knowledge_bases "HTTP/1.1 200 OK" 2024-09-29 19:41:03.013 | INFO | chatchat.server.knowledge_base.utils:file2docs:336 - RapidOCRPDFLoader used for /root/data/temp/c24b6ffe59ac485f866a42f921d0f637/药包材-原始记录.pdf 2024-09-29 19:41:03.033 | ERROR | chatchat.server.knowledge_base.utils:get_loader:192 - ImportError: 为文件/root/data/temp/c24b6ffe59ac485f866a42f921d0f637/药包材-原始记录.pdf查找加载器RapidOCRPDFLoader时出错:libGL.so.1: cannot open shared object file: No such file or directory 2024-09-29 19:41:04.979 | INFO | chatchat.server.knowledge_base.kb_cache.faiss_cache:load_vector_store:162 - loading vector store in 'c24b6ffe59ac485f866a42f921d0f637' to memory. 2024-09-29 19:41:05,330 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 19:41:05,399 faiss.loader 34938 INFO Loading faiss with AVX2 support. 2024-09-29 19:41:05,700 faiss.loader 34938 INFO Successfully loaded faiss with AVX2 support. 2024-09-29 19:41:05.711 | ERROR | chatchat.server.chat.file_chat:upload_temp_docs:108 - Failed to add documents to faiss: tuple index out of range INFO: 127.0.0.1:52064 - "POST /knowledge_base/upload_temp_docs HTTP/1.1" 200 OK 2024-09-29 19:41:05,714 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/upload_temp_docs "HTTP/1.1 200 OK" 2024-09-29 20:06:41.578 | INFO | chatchat.server.knowledge_base.utils:file2docs:336 - RapidOCRPDFLoader used for /root/data/temp/fcbaeccb2afe4e91be0cc16554ff1dba/药品-原始记录.pdf 2024-09-29 20:06:41.581 | ERROR | chatchat.server.knowledge_base.utils:get_loader:192 - ImportError: 为文件/root/data/temp/fcbaeccb2afe4e91be0cc16554ff1dba/药品-原始记录.pdf查找加载器RapidOCRPDFLoader时出错:libGL.so.1: cannot open shared object file: No such file or directory 2024-09-29 20:06:41.587 | INFO | chatchat.server.knowledge_base.kb_cache.faiss_cache:load_vector_store:162 - loading vector store in 'fcbaeccb2afe4e91be0cc16554ff1dba' to memory. 2024-09-29 20:06:41,907 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:06:41.923 | ERROR | chatchat.server.chat.file_chat:upload_temp_docs:108 - Failed to add documents to faiss: tuple index out of range INFO: 127.0.0.1:40232 - "POST /knowledge_base/upload_temp_docs HTTP/1.1" 200 OK 2024-09-29 20:06:41,928 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/upload_temp_docs "HTTP/1.1 200 OK" INFO: 127.0.0.1:33574 - "POST /knowledge_base/temp_kb/fcbaeccb2afe4e91be0cc16554ff1dba/chat/completions HTTP/1.1" 200 OK 2024-09-29 20:06:50,888 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/temp_kb/fcbaeccb2afe4e91be0cc16554ff1dba/chat/completions "HTTP/1.1 200 OK" 2024-09-29 20:06:51,152 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:06:51,372 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:06:51.385 | ERROR | chatchat.server.chat.kb_chat:knowledge_base_chat_iterator:221 - error in knowledge chat: 0 2024-09-29 20:07:23.153 | INFO | chatchat.server.knowledge_base.utils:file2docs:336 - RapidOCRPDFLoader used for /root/data/temp/a198e30a6aa84499be980071c2ecf244/药品-报告.pdf 2024-09-29 20:07:23.157 | ERROR | chatchat.server.knowledge_base.utils:get_loader:192 - ImportError: 为文件/root/data/temp/a198e30a6aa84499be980071c2ecf244/药品-报告.pdf查找加载器RapidOCRPDFLoader时出错:libGL.so.1: cannot open shared object file: No such file or directory 2024-09-29 20:07:23.162 | INFO | chatchat.server.knowledge_base.kb_cache.faiss_cache:load_vector_store:162 - loading vector store in 'a198e30a6aa84499be980071c2ecf244' to memory. 2024-09-29 20:07:23,439 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:23.455 | ERROR | chatchat.server.chat.file_chat:upload_temp_docs:108 - Failed to add documents to faiss: tuple index out of range INFO: 127.0.0.1:48134 - "POST /knowledge_base/upload_temp_docs HTTP/1.1" 200 OK 2024-09-29 20:07:23,460 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/upload_temp_docs "HTTP/1.1 200 OK" INFO: 127.0.0.1:58078 - "POST /knowledge_base/temp_kb/a198e30a6aa84499be980071c2ecf244/chat/completions HTTP/1.1" 200 OK 2024-09-29 20:07:29,526 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/temp_kb/a198e30a6aa84499be980071c2ecf244/chat/completions "HTTP/1.1 200 OK" 2024-09-29 20:07:29,792 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:30,006 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:30.018 | ERROR | chatchat.server.chat.kb_chat:knowledge_base_chat_iterator:221 - error in knowledge chat: 0 2024-09-29 20:07:48.991 | INFO | chatchat.server.knowledge_base.utils:file2docs:336 - RapidOCRPDFLoader used for /root/data/temp/238b69f6dd4d44968483a9160b7025f8/药品-报告.pdf 2024-09-29 20:07:48.997 | INFO | chatchat.server.knowledge_base.utils:file2docs:336 - RapidOCRPDFLoader used for /root/data/temp/238b69f6dd4d44968483a9160b7025f8/海的女儿.pdf 2024-09-29 20:07:49.000 | ERROR | chatchat.server.knowledge_base.utils:get_loader:192 - ImportError: 为文件/root/data/temp/238b69f6dd4d44968483a9160b7025f8/药品-报告.pdf查找加载器RapidOCRPDFLoader时出错:libGL.so.1: cannot open shared object file: No such file or directory 2024-09-29 20:07:49.008 | ERROR | chatchat.server.knowledge_base.utils:get_loader:192 - ImportError: 为文件/root/data/temp/238b69f6dd4d44968483a9160b7025f8/海的女儿.pdf查找加载器RapidOCRPDFLoader时出错:libGL.so.1: cannot open shared object file: No such file or directory 2024-09-29 20:07:49.011 | INFO | chatchat.server.knowledge_base.kb_cache.faiss_cache:load_vector_store:162 - loading vector store in '238b69f6dd4d44968483a9160b7025f8' to memory. 2024-09-29 20:07:49,305 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:49.320 | ERROR | chatchat.server.chat.file_chat:upload_temp_docs:108 - Failed to add documents to faiss: tuple index out of range INFO: 127.0.0.1:47256 - "POST /knowledge_base/upload_temp_docs HTTP/1.1" 200 OK 2024-09-29 20:07:49,324 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/upload_temp_docs "HTTP/1.1 200 OK" INFO: 127.0.0.1:56900 - "POST /knowledge_base/temp_kb/238b69f6dd4d44968483a9160b7025f8/chat/completions HTTP/1.1" 200 OK 2024-09-29 20:07:55,602 httpx 34944 INFO HTTP Request: POST http://127.0.0.1:7861/knowledge_base/temp_kb/238b69f6dd4d44968483a9160b7025f8/chat/completions "HTTP/1.1 200 OK" 2024-09-29 20:07:55,863 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:56,080 httpx 34938 INFO HTTP Request: POST http://127.0.0.1:3000/v1/embeddings "HTTP/1.1 200 OK" 2024-09-29 20:07:56.090 | ERROR | chatchat.server.chat.kb_chat:knowledge_base_chat_iterator:221 - error in knowledge chat: 0