X-D-Lab / LangChain-ChatGLM-Webui

基于LangChain和ChatGLM-6B等系列LLM的针对本地知识库的自动问答
Apache License 2.0
3.14k stars 474 forks source link

上传的txt,无法读取UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128) #87

Open lunar333 opened 1 year ago

lunar333 commented 1 year ago

Traceback (most recent call last): File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/gradio/routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/gradio/blocks.py", line 1075, in process_api result = await self.call_function( File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/gradio/blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) File "app.py", line 194, in init_vector_store vector_store = knowladge_based_chat_llm.init_knowledge_vector_store( File "app.py", line 88, in init_knowledge_vector_store docs = self.load_file(filepath) File "app.py", line 153, in load_file docs = loader.load_and_split(text_splitter=textsplitter) File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/langchain/document_loaders/base.py", line 25, in load_and_split docs = self.load() File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py", line 61, in load elements = self._get_elements() File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/langchain/document_loaders/unstructured.py", line 95, in _get_elements return partition(filename=self.file_path, **self.unstructured_kwargs) File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/unstructured/partition/auto.py", line 108, in partition filetype = detect_filetype( File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/unstructured/file_utils/filetype.py", line 231, in detect_filetype if _is_text_file_a_json(file=file, filename=filename): File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/site-packages/unstructured/file_utils/filetype.py", line 313, in _is_text_file_a_json file_text = f.read() File "/home/zhonghuihang/miniconda3/envs/lang/lib/python3.8/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 0: ordinal not in range(128)

Yanllan commented 8 months ago

请尝试在代码开头(至少在报错之前)加入以下代码: import sys #reload()之前必须要引入模块 reload(sys) sys.setdefaultencoding('utf-8')

123456ADWAE2 commented 7 months ago

参考:https://github.com/X-D-Lab/LangChain-ChatGLM-Webui/pull/148