chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
31.81k stars 5.55k forks source link

[BUG]切换milvus向量库,初始化数据报错 #3180

Closed jiyang2024 closed 7 months ago

jiyang2024 commented 8 months ago

切换milvus向量库,执行python init_database.py --recreate-vs进行初始化报错, 错误信息: Traceback (most recent call last): File "/root/jgy/Langchain-Chatchat/init_database.py", line 107, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/root/jgy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db files2vs(kb_name, kb_files) File "/root/jgy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True) File "/root/jgy/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc doc_infos = self.do_add_doc(docs, kwargs) File "/root/jgy/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 83, in do_add_doc ids = self.milvus.add_documents(docs) File "/root/anaconda3/envs/env_rag/lib/python3.10/site-packages/langchain_core/vectorstores.py", line 119, in add_documents return self.add_texts(texts, metadatas, kwargs) File "/root/anaconda3/envs/env_rag/lib/python3.10/site-packages/langchain_community/vectorstores/milvus.py", line 520, in add_texts assert isinstance( AssertionError: A list of valid ids are required when auto_id is False.

dosubot[bot] commented 8 months ago

你好 @jiyang2024!很高兴再次见到你,希望你一切都好。

您在初始化数据库并切换到 Milvus 向量库时遇到的 "A list of valid ids are required when auto_id is False" 错误可能是由于在向 Milvus 数据库添加文档时没有明确指定它们的 ID,而 Milvus 的 auto_id 设置为 False

在 Milvus 中,当您向集合添加文档(向量)时,可以选择让 Milvus 通过将 auto_id 设置为 True 来自动为这些文档生成唯一的 ID,或者您可以通过将 auto_id 设置为 False 来为每个文档提供您自己的唯一 ID。错误消息表明 auto_id 被设置为 False,但是添加文档到数据库的代码没有提供 ID 列表。

要解决此问题,您需要修改 MilvusKBService 类中的 add_doc 方法,当 auto_id 设置为 False 时,包括为每个被添加的文档提供 ID。如果 MilvusKBService 类与其他 KB 服务类相似,且当前不处理 ID,那么您需要调整它,要么在应用程序中生成 ID,要么更改配置,让 Milvus 通过将 auto_id 设置为 True 来自动处理 ID 生成。

如果您可以控制应用程序中的 Milvus 集合创建过程,您可以确保在创建集合时将 auto_id 设置为 True。这个更改将允许 Milvus 自动为文档分配 ID,从而避免需要手动指定它们。然而,如果您出于特定原因需要保持 auto_idFalse,那么您将需要修改文档插入逻辑,为每个文档包含一个唯一的 ID。

由于提供的上下文没有包括 MilvusKBService 类的实现细节或 Milvus 集合是如何配置的,我无法提供具体的代码更改。您需要查看 MilvusKBService 类,特别是 add_doc 方法,并确保它根据您的 auto_id 配置正确处理 ID。

参考源代码:

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
jiyang2024 commented 7 months ago

你好,初始化时,发现milvus的数据库和collection都没有发现,如何修改collection的auto_id 设置为True?

hollowdjj commented 7 months ago

server/knowledge_base/kb_service/milvus_kb_service.py文件,56行后面加一个参数设置auto_id为True

MuuuShin commented 7 months ago

您好,auto_id为True之后仍继续报错,请问你是怎么解决的?

报错详情 (前略) 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:34,229 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='ChatGPT是OpenAI开发的一个大型语言模型,可以提供各种主题的信息,\n# 如何向 ChatGPT 提问以获得高质量答案:提示技巧工程完全指南\n## 介绍\n我很高兴欢迎您阅读我的最新书籍《The Art of Asking ChatGPT for High-Quality Answers: A complete Guide to Prompt Engineering Techniques》。本书是一本全面指南,介绍了各种提示技术,用于从ChatGPT中生成高质量的答案。\n我们将探讨如何使用不同的提示工程技术来实现不同的目标。ChatGPT是一款最先进的语言模型,能够生成类似人类的文本。然而,理解如何正确地向ChatGPT提问以获得我们所需的高质量输出非常重要。而这正是本书的目的。' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/test_files/test.txt'} 2024-03-26 19:18:34,833 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:34,834 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:34,834 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/372669736\nCreating Large Language Model Applications Utilizing LangChain: A Primer on\nDeveloping LLM Apps Fast\nArticle\xa0\xa0in\xa0\xa0International Conference on Applied Engineering and Natural Sciences · July 2023\nDOI: 10.59287/icaens.1127\nCITATIONS\n0\nREADS\n47\n2 authors:\nSome of the authors of this publication are also working on these related projects:\nTHALIA: Test Harness for the Assessment of Legacy Information Integration Approaches View project\nAnalysis of Feroresonance with Signal Processing Technique View project\nOguzhan Topsakal' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/test_files/langchain.pdf'} 2024-03-26 19:18:35,221 - tokenization_chatglm.py[line:164] - WARNING: Setting eos_token is not supported, use the default one. 2024-03-26 19:18:35,222 - tokenization_chatglm.py[line:160] - WARNING: Setting pad_token is not supported, use the default one. 2024-03-26 19:18:35,222 - tokenization_chatglm.py[line:156] - WARNING: Setting unk_token is not supported, use the default one. 文档切分示例:page_content='BoolQ\nPIQA\nSIQA\nHella-Swag\nARC-e\nARC-c\nNQ\nTQA\nMMLU\nGSM8K\nHuman-Eval\nMHA\n71.0\n79.3\n48.2\n75.1\n71.2\n43.0\n12.4\n44.7\n28.0\n4.9\n7.9\nMQA\n70.6' metadata={'source': '/data/bch/LLM/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型推理优化策略-幕布图片-930255-616209.jpg'} Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:02<00:00, 4.50it/s] 2024-03-26 19:18:38,421 - decorators.py[line:139] - ERROR: RPC error: [create_index], , Traceback (most recent call last): File "/data/bch/LLM/Langchain-Chatchat/init_database.py", line 107, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db files2vs(kb_name, kb_files) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True) File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 131, in add_doc doc_infos = self.do_add_doc(docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/LLM/Langchain-Chatchat/server/knowledge_base/kb_service/milvus_kb_service.py", line 84, in do_add_doc ids = self.milvus.add_documents(docs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 119, in add_documents return self.add_texts(texts, metadatas, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 586, in add_texts insert_list = [insert_dict[x][i:end] for x in self.fields] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/bch/miniconda3/envs/LLMEnv/lib/python3.11/site-packages/langchain_community/vectorstores/milvus.py", line 586, in insert_list = [insert_dict[x][i:end] for x in self.fields] ~~~~~~~~~~~^^^ KeyError: 'pk'
AIdrinkhotWater commented 6 months ago

我也遇到了,同问

MuuuShin commented 6 months ago

我也遇到了,同问

你可以看一下我的那个issue

AIdrinkhotWater commented 6 months ago

langchain 0.0.354 langchain-community 0.0.20 langchain-core 0.1.23 版本改成这样就可以了,我使用的是langchainchatchat0.2.10版本代码,milvus版本伟2.2.13