chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
32.29k stars 5.6k forks source link

python init_database.py --recreate-vs报错 #2261

Closed daimy666 closed 12 months ago

daimy666 commented 1 year ago

问题描述 / Problem Description 在运行python init_database.py --recreate-vs 时报错

报错信息:(截取部分) RapidOCRPDFLoader context page index: 5: 62%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 5/8 [01:28<00:56, 18.85s/it]文档切分示例:page_content='Vamana\n这个算法和NSG[2][4]思路比较像(不了解NSG的可以看参考文献2,不想读paper的话可以\n看参考文献4),主要区别在于裁边策略。准确的说是给NSG的裁边策略上加了一个开关\nalpha。NSG的裁边策略主要思路是:对于目标点邻居的选择尽可能多样化,如果新邻居相比目标\n点,更靠近目标点的某个邻居,我们可以不必将这个点加入邻居点集中。也就是说,对于目标点的\n每个邻居节点,周围方圆dist(目标点,邻居点)范围内不能有其他邻居点。这个裁边策略有效控\n制了图的出度,并且比较激进,所以减少了索引的内存占用,提高了搜索速度,但同时也降低了搜\n索精度。Vamana的裁边策略其实就是通过参数alpha自由控制裁边的尺度。具体作用原理是给' metadata={'source': '/home/b1006/dmy/Langchain-Chatchat/knowledge_base/samples/content/llm/img/大模型应用技术原理-幕布图片-580318-260070.jpg'} RapidOCRPDFLoader context page index: 7: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [01:30<00:00, 11.37s/it] 文档切分示例:page_content='See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/372669736\nCreating Large Language Model Applications Utilizing LangChain: A Primer on\nDeveloping LLM Apps Fast\nArticle\xa0\xa0in\xa0\xa0International Conference on Applied Engineering and Natural Sciences · July 2023\nDOI: 10.59287/icaens.1127\nCITATIONS\n0\nREADS\n47\n2 authors:\nSome of the authors of this publication are also working on these related projects:\nTHALIA: Test Harness for the Assessment of Legacy Information Integration Approaches View project\nAnalysis of Feroresonance with Signal Processing Technique View project\nOguzhan Topsakal' metadata={'source': '/home/b1006/dmy/Langchain-Chatchat/knowledge_base/samples/content/test_files/langchain.pdf'} Traceback (most recent call last): File "/home/b1006/dmy/Langchain-Chatchat/init_database.py", line 108, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "/home/b1006/dmy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 128, in folder2db files2vs(kb_name, kb_files) File "/home/b1006/dmy/Langchain-Chatchat/server/knowledge_base/migrate.py", line 113, in files2vs kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True) File "/home/b1006/dmy/Langchain-Chatchat/server/knowledge_base/kb_service/base.py", line 130, in add_doc doc_infos = self.do_add_doc(docs, **kwargs) File "/home/b1006/dmy/Langchain-Chatchat/server/knowledge_base/kb_service/faiss_kb_service.py", line 74, in do_add_doc ids = vs.add_embeddings(text_embeddings=zip(data["texts"], data["embeddings"]), TypeError: 'NoneType' object is not subscriptable

liunux4odoo commented 1 year ago

需要贴出完整的报错信息

yifei-lu commented 11 months ago

这个问题我也遇到了,同样的报错。 我是用的轻量化部署装的requirements_lite.txt,EMBEDDING_MODEL和LLM_MODELS都用的["zhipu-api"]; 0.2.7没出这个问题,0.2.8出的; 关于API调用的问题,我准备重新提issue.

masterDXP commented 8 months ago

同样的报错,请问解决了吗? Traceback (most recent call last): File "E:\QWen\Langchain\Langchain-Chatchat\init_database.py", line 107, in folder2db(kb_names=args.kb_name, mode="recreate_vs", embed_model=args.embed_model) File "E:\QWen\Langchain\Langchain-Chatchat\server\knowledge_base\migrate.py", line 128, in folder2db files2vs(kb_name, kb_files) File "E:\QWen\Langchain\Langchain-Chatchat\server\knowledge_base\migrate.py", line 113, in files2vs kb.add_doc(kb_file=kb_file, not_refresh_vs_cache=True) File "E:\QWen\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\base.py", line 131, in add_doc doc_infos = self.do_add_doc(docs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\QWen\Langchain\Langchain-Chatchat\server\knowledge_base\kb_service\faiss_kb_service.py", line 78, in do_add_doc ids = vs.add_embeddings(text_embeddings=zip(data["texts"], data["embeddings"]),

Nancy7zt commented 3 months ago

这个问题有解决吗