chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.26k stars 5.45k forks source link

[BUG] 初始化知识库报错 #1054

Closed loner0906 closed 1 year ago

loner0906 commented 1 year ago

问题描述 / Problem Description 按要求安装requirements.txt,核实版本langchain==0.0.257 初始化知识库:python init_database.py --recreate-vs 报错:exception: partition() got an unexpected keyword argument 'autodetect_encoding'

imClumsyPanda commented 1 year ago

请问有按照readme中说明重新新建开发环境并重新安装python依赖包吗?

DemoQAQ commented 1 year ago

image 我输入后这样报错

liunux4odoo commented 1 year ago

@DemoQAQ 请帖一个完整的log

DemoQAQ commented 1 year ago

请问有按照readme中说明重新新建开发环境并重新安装python依赖包吗?

全部安装的

DemoQAQ commented 1 year ago

@DemoQAQ 请帖一个完整的log

构建向量知识库这里我是解决了,初步怀疑是m3e-base模型问题,但是现在遇到了另一个问题,我在用本地路径chatglm2-6b运行的时候会报超时,但是使用云端的就正常,是因为我电脑性能太差了吗,CPU 3950x GPU RTX2070 RAM 32G

DemoQAQ commented 1 year ago

@DemoQAQ 请帖一个完整的log

image 这个是现在的问题

DemoQAQ commented 1 year ago

@DemoQAQ 请帖一个完整的log

image 这个是现在的问题

当我发起询问时,会报超时错误

DemoQAQ commented 1 year ago

image 当我重新构建向量库 给出了这样的报错

zoo17qian commented 1 year ago

python版本,换成3.10

imClumsyPanda commented 1 year ago

请问最新版本的代码中是否仍能够复现该问题?

loner0906 commented 1 year ago

代码重新clone,依赖包重新初始化。 运行依旧报错: PS E:\code\Langchain-Chatchat> python init_database.py --recreate-vs 2023-08-21 14:45:42,916 - utils.py[line:148] - INFO: Note: NumExpr detected 56 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2023-08-21 14:45:42,917 - utils.py[line:160] - INFO: NumExpr defaulting to 8 threads. database talbes created recreating all vector stores UnstructuredFileLoader partition() got an unexpected keyword argument 'autodetect_encoding' PS E:\code\Langchain-Chatchat>

solofeng commented 1 year ago

需要升级 unstructured unstructured-inference,我的是unstructured-0.10.5 unstructured-inference-0.5.16

xiaoxiao261258 commented 1 year ago

需要升级 unstructured unstructured-inference,我的是unstructured-0.10.5 unstructured-inference-0.5.16

果然好使!@solofeng

mhghappy commented 1 year ago

第一次初始化知识库报错: 本机配置: macOS 13.5.1 Intel CPU 8h,内存32g python3.9

报错日志: $ NUMEXPR_MAX_THREADS=1 python init_database.py --recreate-vs

==============================Langchain-Chatchat Configuration============================== 操作系统:macOS-10.16-x86_64-i386-64bit. python版本:3.9.17 (main, Jul 5 2023, 16:17:03) [Clang 14.0.6 ] 项目版本:v0.2.2 langchain版本:0.0.266. fastchat版本:0.2.24

当前LLM模型:chatglm2-6b @ cpu {'api_base_url': 'http://localhost:8888/v1', 'api_key': 'EMPTY', 'local_model_path': '/Users/xx/Documents/chatglmmode/chatglm2-6b'} 当前Embbedings模型: m3e-base @ cpu ==============================Langchain-Chatchat Configuration==============================

database talbes created recreating all vector stores loading vector store in 'samples'. 2023-08-26 21:08:58,133 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: moka-ai/m3e-base 2023-08-26 21:08:58,262 - instantiator.py[line:21] - INFO: Created a temporary directory at /var/folders/qp/9vz8wrf96235c8xj4lfptrtw0000gn/T/tmpd21165kx 2023-08-26 21:08:58,263 - instantiator.py[line:76] - INFO: Writing /var/folders/qp/9vz8wrf96235c8xj4lfptrtw0000gn/T/tmpd21165kx/_remote_module_non_scriptable.py Batches: 100%|████████████████████████████████████| 1/1 [00:00<00:00, 33.88it/s] 2023-08-26 21:08:58,991 - loader.py[line:54] - INFO: Loading faiss with AVX2 support. 2023-08-26 21:08:59,025 - loader.py[line:56] - INFO: Successfully loaded faiss with AVX2 support. zsh: segmentation fault NUMEXPR_MAX_THREADS=1 python init_database.py --recreate-vs /Users/xx/Documents/soft/python3.9-chatglm2/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

是什么原因呢?求解

pip list: Package Version


accelerate 0.22.0 aiofiles 23.2.1 aiohttp 3.8.5 aiosignal 1.3.1 altair 5.0.1 annotated-types 0.5.0 antlr4-python3-runtime 4.9.3 anyio 3.7.1 async-timeout 4.0.3 attrs 23.1.0 beautifulsoup4 4.12.2 blinker 1.6.2 blis 0.7.10 cachetools 5.3.1 catalogue 2.0.9 certifi 2023.7.22 cffi 1.15.1 chardet 5.2.0 charset-normalizer 3.2.0 click 8.1.7 coloredlogs 15.0.1 confection 0.1.1 contourpy 1.1.0 cpm-kernels 1.0.11 cryptography 41.0.3 cycler 0.11.0 cymem 2.0.7 dataclasses-json 0.5.14 EbookLib 0.18 effdet 0.4.1 emoji 2.8.0 et-xmlfile 1.1.0 exceptiongroup 1.1.3 faiss-cpu 1.7.4 fastapi 0.99.1 ffmpy 0.3.1 filelock 3.12.2 filetype 1.2.0 flatbuffers 23.5.26 fonttools 4.42.1 frozenlist 1.4.0 fschat 0.2.24 fsspec 2023.6.0 gitdb 4.0.10 GitPython 3.1.32 gradio 3.41.1 gradio_client 0.5.0 greenlet 2.0.2 h11 0.14.0 httpcore 0.17.3 httpx 0.24.1 huggingface-hub 0.16.4 humanfriendly 10.0 idna 3.4 importlib-metadata 6.8.0 importlib-resources 6.0.1 iopath 0.1.10 Jinja2 3.1.2 joblib 1.3.2 jsonschema 4.19.0 jsonschema-specifications 2023.7.1 kiwisolver 1.4.5 langchain 0.0.266 langcodes 3.3.0 langsmith 0.0.26 latex2mathml 3.76.0 layoutparser 0.3.4 lxml 4.9.3 Markdown 3.4.4 markdown-it-py 3.0.0 markdown2 2.4.10 MarkupSafe 2.1.3 marshmallow 3.20.1 matplotlib 3.7.2 mdtex2html 1.2.0 mdurl 0.1.2 mpmath 1.3.0 msg-parser 1.2.0 multidict 6.0.4 murmurhash 1.0.9 mypy-extensions 1.0.0 networkx 3.1 nh3 0.2.14 nltk 3.8.1 numexpr 2.8.5 numpy 1.24.4 olefile 0.46 omegaconf 2.3.0 onnxruntime 1.15.1 openai 0.27.9 openapi-schema-pydantic 1.2.4 opencv-python 4.8.0.76 openpyxl 3.1.2 orjson 3.9.5 packaging 23.1 pandas 2.0.3 pathy 0.10.2 pdf2image 1.16.3 pdfminer.six 20221105 pdfplumber 0.10.2 Pillow 9.5.0 pip 23.2.1 portalocker 2.7.0 preshed 3.0.8 prompt-toolkit 3.0.39 protobuf 4.24.1 psutil 5.9.5 pyarrow 13.0.0 pycocotools 2.0.7 pycparser 2.21 pydantic 1.10.12 pydantic_core 2.6.3 pydeck 0.8.0 pydub 0.25.1 Pygments 2.16.1 Pympler 1.0.1 pypandoc 1.11 pyparsing 3.0.9 pypdfium2 4.18.0 pytesseract 0.3.10 python-dateutil 2.8.2 python-decouple 3.8 python-docx 0.8.11 python-magic 0.4.27 python-multipart 0.0.6 python-pptx 0.6.21 pytz 2023.3 pytz-deprecation-shim 0.1.0.post0 PyYAML 6.0.1 referencing 0.30.2 regex 2023.8.8 requests 2.31.0 rich 13.5.2 rpds-py 0.9.2 safetensors 0.3.3 scikit-learn 1.3.0 scipy 1.11.2 semantic-version 2.10.0 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 68.0.0 shortuuid 1.0.11 simplejson 3.19.1 six 1.16.0 smart-open 6.3.0 smmap 5.0.0 sniffio 1.3.0 soupsieve 2.4.1 spacy 3.6.1 spacy-legacy 3.0.12 spacy-loggers 1.0.4 spacy-pkuseg 0.0.32 SQLAlchemy 2.0.19 srsly 2.4.7 sse-starlette 1.6.5 starlette 0.27.0 streamlit 1.26.0 streamlit-aggrid 0.3.4.post3 streamlit-antd-components 0.1.16 streamlit-chatbox 1.1.7 streamlit-option-menu 0.3.6 svgwrite 1.4.3 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 thinc 8.1.12 threadpoolctl 3.2.0 tiktoken 0.4.0 timm 0.9.5 tokenizers 0.13.3 toml 0.10.2 toolz 0.12.0 torch 2.0.1 torchvision 0.15.2 tornado 6.3.3 tqdm 4.66.1 transformers 4.32.0 typer 0.9.0 typing_extensions 4.7.1 typing-inspect 0.9.0 tzdata 2023.3 tzlocal 4.3.1 unstructured 0.10.6 unstructured-inference 0.5.17 urllib3 2.0.4 uvicorn 0.23.2 validators 0.21.2 wasabi 1.1.2 watchdog 3.0.0 wavedrom 2.0.3.post3 wcwidth 0.2.6 websockets 11.0.3 wheel 0.38.4 xlrd 2.0.1 XlsxWriter 3.1.2 yarl 1.9.2 zh-core-web-sm 3.6.0 zipp 3.16.2

glide-the commented 1 year ago

目前0.2版本的 分词器依赖unstructured 出现函数不存在入参的错误后

partition() got an unexpected keyword argument 'autodetect_encoding'

尝试使用下面命令修复这个错误

pip install unstructured==0.9.0 unstructured-inference==0.5.7