chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.7k stars 5.53k forks source link

运行python init_database.py --recreate-vs 报错 #2096

Closed sxsxsx closed 11 months ago

sxsxsx commented 11 months ago

问题描述 / Problem Description 用简洁明了的语言描述这个问题 / Describe the problem in a clear and concise manner.

复现问题的步骤 / Steps to Reproduce

  1. 执行 python init_database.py --recreate-vs 预期的结果 / Expected Result 正常执行

实际结果 / Actual Result recreating all vector stores 2023-11-17 09:51:39,316 - faiss_cache.py[line:80] - INFO: loading vector store in 'samples/vector_store/m3e-base' from disk. 2023-11-17 09:51:39,743 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: m3e-base 2023-11-17 09:51:39,743 - SentenceTransformer.py[line:805] - WARNING: No sentence-transformers model found with name m3e-base. Creating a new one with MEAN pooling. 2023-11-17 09:51:39,743 - embeddings_api.py[line:39] - ERROR: m3e-base does not appear to have a file named config.json. Checkout 'https://huggingface.co/m3e-base/None' for available files. AttributeError: 'NoneType' object has no attribute 'conjugate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/init_database.py", line 108, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/migrate.py", line 118, in folder2db kb.create_kb() File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 80, in create_kb self.do_create_kb() File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 47, in do_create_kb self.load_vector_store() File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 28, in load_vector_store return kb_faiss_pool.load_vector_store(kb_name=self.kb_name, File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 90, in load_vector_store vector_store = self.new_vector_store(embed_model=embed_model, embed_device=embed_device) File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 48, in new_vector_store vector_store = FAISS.from_documents([doc], embeddings, normalize_L2=True) File "/home/admin/miniconda3/envs/chatchat/lib/python3.10/site-packages/langchain/schema/vectorstore.py", line 510, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, *kwargs) File "/home/admin/miniconda3/envs/chatchat/lib/python3.10/site-packages/langchain/vectorstores/faiss.py", line 911, in from_texts embeddings = embedding.embed_documents(texts) File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 380, in embed_documents return normalize(embeddings).tolist() File "/workspace/workdir/tevs_multi_idc_10g_20220825163730/lyq/langchain/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 37, in normalize norm = np.linalg.norm(embeddings, axis=1) File "<__array_function__ internals>", line 200, in norm File "/home/admin/miniconda3/envs/chatchat/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 2541, in norm s = (x.conj() x).real TypeError: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

环境信息 / Environment Information

accelerate 0.24.1 aiofiles 23.2.1 aiohttp 3.8.6 aiolimiter 1.1.0 aiosignal 1.3.1 altair 5.1.2 antlr4-python3-runtime 4.9.3 anyio 3.7.1 async-timeout 4.0.3 attrs 23.1.0 backoff 2.2.1 bce-python-sdk 0.8.96 beautifulsoup4 4.12.2 blinker 1.7.0 blis 0.7.11 Brotli 1.1.0 cachetools 5.3.2 catalogue 2.0.10 certifi 2023.7.22 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 cloudpathlib 0.16.0 coloredlogs 15.0.1 confection 0.1.3 contourpy 1.2.0 cryptography 41.0.5 cycler 0.12.1 cymem 2.0.8 dashscope 1.13.3 dataclasses 0.6 dataclasses-json 0.6.2 distro 1.8.0 duckduckgo-search 3.8.5 effdet 0.4.1 einops 0.7.0 emoji 2.8.0 et-xmlfile 1.1.0 exceptiongroup 1.1.3 faiss-cpu 1.7.4 fastapi 0.104.1 filelock 3.13.1 filetype 1.2.0 flatbuffers 23.5.26 fonttools 4.44.3 frozenlist 1.4.0 fschat 0.2.32 fsspec 2023.10.0 future 0.18.3 gitdb 4.0.11 GitPython 3.1.40 greenlet 3.0.1 h11 0.14.0 h2 4.1.0 hpack 4.0.0 httpcore 0.17.3 httptools 0.6.1 httpx 0.25.1 huggingface-hub 0.19.4 humanfriendly 10.0 hyperframe 6.0.1 idna 3.4 importlib-metadata 6.8.0 iniconfig 2.0.0 iopath 0.1.10 Jinja2 3.1.2 joblib 1.3.2 jsonpatch 1.33 jsonpointer 2.4 jsonschema 4.20.0 jsonschema-specifications 2023.11.1 kiwisolver 1.4.5 langchain 0.0.336 langchain-experimental 0.0.41 langcodes 3.3.0 langdetect 1.0.9 langsmith 0.0.64 layoutparser 0.3.4 lxml 4.9.3 Markdown 3.5.1 markdown-it-py 3.0.0 markdown2 2.4.10 markdownify 0.11.6 MarkupSafe 2.1.3 marshmallow 3.20.1 matplotlib 3.8.1 mdurl 0.1.2 metaphor-python 0.1.20 mpmath 1.3.0 msg-parser 1.2.0 msgpack 1.0.7 multidict 6.0.4 murmurhash 1.0.10 mypy-extensions 1.0.0 networkx 3.2.1 nh3 0.2.14 ninja 1.11.1.1 nltk 3.8.1 numexpr 2.8.7 numpy 1.24.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.3.101 nvidia-nvtx-cu12 12.1.105 olefile 0.46 omegaconf 2.3.0 onnx 1.15.0 onnxruntime 1.15.1 openai 1.3.2 opencv-python 4.8.1.78 openpyxl 3.1.2 packaging 23.2 pandas 2.0.3 pathlib 1.0.1 pdf2image 1.16.3 pdfminer.six 20221105 pdfplumber 0.10.3 peft 0.6.2 Pillow 9.5.0 pip 23.3.1 pluggy 1.3.0 portalocker 2.8.2 preshed 3.0.9 prompt-toolkit 3.0.41 protobuf 4.25.1 psutil 5.9.6 pyarrow 14.0.1 pyclipper 1.3.0.post5 pycocotools 2.0.7 pycparser 2.21 pycryptodome 3.19.0 pydantic 1.10.13 pydeck 0.8.1b0 Pygments 2.16.1 PyJWT 2.8.0 PyMuPDF 1.23.6 PyMuPDFb 1.23.6 pypandoc 1.12 pyparsing 3.1.1 pypdfium2 4.24.0 pytesseract 0.3.10 pytest 7.4.3 python-dateutil 2.8.2 python-decouple 3.8 python-docx 1.1.0 python-dotenv 1.0.0 python-iso639 2023.6.15 python-magic 0.4.27 python-multipart 0.0.6 python-pptx 0.6.23 pytz 2023.3.post1 PyYAML 6.0.1 qianfan 0.1.1 rapidfuzz 3.5.2 rapidocr-onnxruntime 1.3.8 ray 2.8.0 referencing 0.31.0 regex 2023.10.3 requests 2.31.0 rich 13.7.0 rpds-py 0.13.0 safetensors 0.4.0 scikit-learn 1.3.2 scipy 1.11.3 sentence-transformers 2.2.2 sentencepiece 0.1.99 setuptools 68.2.2 shapely 2.0.2 shortuuid 1.0.11 simplejson 3.19.2 six 1.16.0 smart-open 6.4.0 smmap 5.0.1 sniffio 1.3.0 socksio 1.0.0 soupsieve 2.5 spacy 3.7.2 spacy-legacy 3.0.12 spacy-loggers 1.0.5 SQLAlchemy 2.0.19 srsly 2.4.8 starlette 0.27.0 streamlit 1.27.2 streamlit-aggrid 0.3.4.post3 streamlit-antd-components 0.2.3 streamlit-chatbox 1.1.11 streamlit-feedback 0.1.2 streamlit-option-menu 0.3.6 strsimpy 0.2.1 svgwrite 1.4.3 sympy 1.12 tabulate 0.9.0 tenacity 8.2.3 thinc 8.2.1 threadpoolctl 3.2.0 tiktoken 0.5.1 timm 0.9.10 tokenizers 0.15.0 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 2.1.0 torchaudio 2.1.0 torchvision 0.16.0 tornado 6.3.3 tqdm 4.66.1 transformers 4.35.2 transformers-stream-generator 0.0.4 triton 2.1.0 typer 0.9.0 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 tzlocal 5.2 unstructured 0.10.30 unstructured-inference 0.7.11 unstructured.pytesseract 0.3.12 urllib3 2.1.0 uvicorn 0.23.2 uvloop 0.19.0 validators 0.22.0 vllm 0.2.0 wasabi 1.1.2 watchdog 3.0.0 watchfiles 0.21.0 wavedrom 2.0.3.post3 wcwidth 0.2.10 weasel 0.3.4 websockets 12.0 wheel 0.41.3 xformers 0.0.22.post7 xlrd 2.0.1 XlsxWriter 3.1.9 yarl 1.9.2 zhipuai 1.0.7 zipp 3.17.0

Atomthin commented 11 months ago

遇到同样的问题

Atomthin commented 11 months ago

报错一样,但是引发的地方不一样 (/data/AI-APP/knowledge) root@ecs-2b1c-1110258:/data/Langchain-Chatchat# python init_database.py --recreate-vs recreating all vector stores 2023-11-20 11:27:06,155 - faiss_cache.py[line:80] - INFO: loading vector store in 'samples/vector_store/m3e-base' from disk. 2023-11-20 11:27:06,517 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: /data/llm/moka-ai/m3e-base 2023-11-20 11:27:06,982 - embeddings_api.py[line:39] - ERROR: Error while deserializing header: HeaderTooLarge AttributeError: 'NoneType' object has no attribute 'conjugate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "init_database.py", line 108, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/data/Langchain-Chatchat/server/knowledge_base/migrate.py", line 118, in folder2db kb.create_kb() File "/data/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 80, in create_kb self.do_create_kb() File "/data/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 47, in do_create_kb self.load_vector_store() File "/data/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 28, in load_vector_store return kb_faiss_pool.load_vector_store(kb_name=self.kb_name, File "/data/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 90, in load_vector_store vector_store = self.new_vector_store(embed_model=embed_model, embed_device=embed_device) File "/data/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 48, in new_vector_store vector_store = FAISS.from_documents([doc], embeddings, normalize_L2=True) File "/data/AI-APP/knowledge/lib/python3.8/site-packages/langchain/schema/vectorstore.py", line 510, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, *kwargs) File "/data/AI-APP/knowledge/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 911, in from_texts embeddings = embedding.embed_documents(texts) File "/data/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 380, in embed_documents return normalize(embeddings).tolist() File "/data/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 37, in normalize norm = np.linalg.norm(embeddings, axis=1) File "<__array_function__ internals>", line 200, in norm File "/data/AI-APP/knowledge/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 2541, in norm s = (x.conj() x).real TypeError: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

hzg0601 commented 11 months ago

由报错信息,多半是你的embedding地址没填对,或者缺少文件,或者文件损坏,可仔细核对本地文件与hf上文件的大小,及是否缺失文件

Atomthin commented 11 months ago

@hzg0601 多谢,重现下载的模型权重以后问题解决,可以初始化了,最开始使用git lfs clone的文件有问题

wzhwzh6666 commented 11 months ago

AttributeError: 'NoneType' object has no attribute 'conjugate' 这个问题重装之后好像还有,有什么解决办法吗

zRzRzRzRzRzRzR commented 11 months ago

权重文件没全,应该是不是在huggingface下载的m3e

leonyu879 commented 10 months ago

try git lfs pull

hustmse1 commented 10 months ago

权重文件没全,应该是不是在huggingface下载的m3e

是在huggingface下载的m3e,不是$ git clone https://huggingface.co/THUDM/chatglm3-6b $ git clone https://huggingface.co/BAAI/bge-large-zh这2个命令下载的

LicoCoder commented 9 months ago

@hzg0601 多谢,重现下载的模型权重以后问题解决,可以初始化了,最开始使用git lfs clone的文件有问题

您好,如何重现下载的模型权重呢?我也是这样的报错

LESIONS110 commented 6 months ago

why?

recreating all vector stores 2024-04-11 21:25:37,470 - faiss_cache.py[line:92] - INFO: loading vector store in 'samples/vector_store/bge-large-zh-v1.5' from disk. 2024-04-11 21:25:37,715 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: BAAI/bge-large-zh-v1.5 2024-04-11 21:25:37,726 - embeddings_api.py[line:39] - ERROR: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/BAAI/bge-large-zh-v1.5 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3864039c90>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 5ff223d4-15b7-4bef-9671-2f24f7f76c90)') AttributeError: 'NoneType' object has no attribute 'conjugate'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/Langchain-Chatchat/init_database.py", line 107, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/migrate.py", line 121, in folder2db kb.create_kb() File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 81, in create_kb self.do_create_kb() File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 51, in do_create_kb self.load_vector_store() File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 28, in load_vector_store return kb_faiss_pool.load_vector_store(kb_name=self.kb_name, File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 102, in load_vector_store vector_store = self.new_vector_store(embed_model=embed_model, embed_device=embed_device) File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_cache/faiss_cache.py", line 60, in new_vector_store vector_store = FAISS.from_documents([doc], embeddings, normalize_L2=True,distance_strategy="METRIC_INNER_PRODUCT") File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 508, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, *kwargs) File "/home/ubuntu/.local/lib/python3.10/site-packages/langchain_community/vectorstores/faiss.py", line 965, in from_texts embeddings = embedding.embed_documents(texts) File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 439, in embed_documents return normalize(embeddings).tolist() File "/home/ubuntu/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 37, in normalize norm = np.linalg.norm(embeddings, axis=1) File "<__array_function__ internals>", line 200, in norm File "/home/ubuntu/.local/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 2541, in norm s = (x.conj() x).real TypeError: loop of ufunc does not support argument 0 of type NoneType which has no callable conjugate method

zhangt-run commented 5 months ago

我的是因为urllib3的版本不合适,更换之后可以了