chatchat-space / Langchain-Chatchat

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
Apache License 2.0
32.28k stars 5.6k forks source link

模型加载成功?文件无法导入。 #107

Closed realalexsun closed 1 year ago

realalexsun commented 1 year ago

所有模型均在本地。 LLM 模型加载无问题。

Embedding 模型加载后显示如下: No sentence-transformers model found with name GanymedeNil/text2vec-large-chinese. Creating a new one with MEAN pooling. No sentence-transformers model found with name GanymedeNil/text2vec-large-chinese. Creating a new one with MEAN pooling.

尝试加载文件(自带的state_of_the_search.txt)报错如下: [nltk_data] Error loading punkt: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1002)> [nltk_data] Error loading averaged_perceptron_tagger: <urlopen error [nltk_data] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify [nltk_data] failed: unable to get local issuer certificate [nltk_data] (_ssl.c:1002)> content/state_of_the_search.txt 未能成功加载 Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/blocks.py", line 1059, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/blocks.py", line 868, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/#AS#/Warehouse/ChatGLM-6B/langchain-ChatGLM/webui.py", line 71, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/#AS#/Warehouse/ChatGLM-6B/langchain-ChatGLM/chains/local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/base.py", line 183, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 345, in from_texts return cls.__from(texts, embeddings, embedding, metadatas, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 307, in __from index = faiss.IndexFlatL2(len(embeddings[0]))


IndexError: list index out of range

还请各位排疑解答
imClumsyPanda commented 1 year ago

建议使用3.8环境进行测试

realalexsun @.***>于2023年4月15日 周六21:19写道:

所有模型均在本地。 LLM 模型加载无问题。

Embedding 模型加载后显示如下: No sentence-transformers model found with name GanymedeNil/text2vec-large-chinese. Creating a new one with MEAN pooling. No sentence-transformers model found with name GanymedeNil/text2vec-large-chinese. Creating a new one with MEAN pooling.

尝试加载文件(自带的state_of_the_search.txt)报错如下: [nltk_data] Error loading punkt: <urlopen error [SSL: [nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed: [nltk_data] unable to get local issuer certificate (_ssl.c:1002)> [nltk_data] Error loading averaged_perceptron_tagger: <urlopen error [nltk_data] [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify [nltk_data] failed: unable to get local issuer certificate [nltk_data] (_ssl.c:1002)> content/state_of_the_search.txt 未能成功加载 Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/routes.py", line 393, in run_predict output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/blocks.py", line 1059, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/gradio/blocks.py", line 868, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 867, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/#AS#/Warehouse/ChatGLM-6B/langchain-ChatGLM/webui.py", line 71, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Applications/#AS#/Warehouse/ChatGLM-6B/langchain-ChatGLM/chains/local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/base.py", line 183, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 345, in from_texts return cls.__from(texts, embeddings, embedding, metadatas, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/vectorstores/faiss.py", line 307, in __from index = faiss.IndexFlatL2(len(embeddings[0]))


IndexError: list index out of range

还请各位排疑解答

—
Reply to this email directly, view it on GitHub
<https://github.com/imClumsyPanda/langchain-ChatGLM/issues/107>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLH5ETD3UFONOFYWWQVOBDXBKN6PANCNFSM6AAAAAAW7NG324>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
iceqing commented 1 year ago

我按照文档采用3.8同样报错

content/langchain-ChatGLM_README.md 未能成功加载Traceback (most recent call last):
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "webui.py", line 71, in get_vector_store
    vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath])
  File "/home/ice/ChatGLM/langchain-ChatGLM/chains/local_doc_qa.py", line 80, in init_knowledge_vector_store
    vector_store = FAISS.from_documents(docs, self.embeddings)
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/langchain/vectorstores/base.py", line 183, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 345, in from_texts
    return cls.__from(texts, embeddings, embedding, metadatas, **kwargs)
  File "/home/ice/anaconda3/envs/py38/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 307, in __from
    index = faiss.IndexFlatL2(len(embeddings[0]))
IndexError: list index out of range
sdhou commented 1 year ago

我也遇到了同样的问题

chi-cat commented 1 year ago

NLTK没有被正确下载;尝试手动下载修复后可以正常加载

import nltk
nltk.set_proxy("http://proxy-host:proxy-port")
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
iceqing commented 1 year ago

我按照README里的手工解压punkt和tagger后,放到指定目录解决了,多谢

punkt.zip 中的 packages/tokenizers 解压,放到 nltk_data/tokenizers 存储路径下。

nltk_data 存储路径可以通过 nltk.data.path 查询。

averaged_perceptron_tagger.zip 下载,解压放到 nltk_data/taggers 存储路径下。 nltk_data 存储路径可以通过 nltk.data.path 查询。

imClumsyPanda commented 1 year ago

最新发布的版本 v1.0.2中新增nltk_data文件夹,并在代码中将其设置为nltk_data的默认路径,可以解决下载过程中可能遇到的问题。

GreatWildFire commented 1 year ago

遇到了同样的错误,采用了上述的方法后依然无法解决。请问有什么其他的建议么,还是因为我的txt数据格式调整的不对呢(有可供参考的数据格式么)?非常感谢 content/data02.txt 未能成功加载 Traceback (most recent call last): File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1302, in process_api result = await self.call_function( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "E:/LLM/langchain-ChatGLM/webui.py", line 82, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) File "E:\LLM\langchain-ChatGLM\chains\local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\base.py", line 116, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 345, in from_texts return cls.__from(texts, embeddings, embedding, metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 307, in __from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range

imClumsyPanda commented 1 year ago

请问样例文件加载正常吗

GreatWildFire @.***>于2023年4月17日 周一12:07写道:

遇到了同样的错误,采用了上述的方法后依然无法解决。请问有什么其他的建议么,还是因为我的txt数据格式调整的不对呢(有可供参考的数据格式么)?非常感谢 content/data02.txt 未能成功加载 Traceback (most recent call last): File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1302, in process_api result = await self.call_function( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "E:/LLM/langchain-ChatGLM/webui.py", line 82, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) File "E:\LLM\langchain-ChatGLM\chains\local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\base.py", line 116, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 345, in from_texts return cls.__from(texts, embeddings, embedding, metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 307, in __from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range

— Reply to this email directly, view it on GitHub https://github.com/imClumsyPanda/langchain-ChatGLM/issues/107#issuecomment-1510666749, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EX7O7QWIVOH6SR3CALXBS6XTANCNFSM6AAAAAAW7NG324 . You are receiving this because you modified the open/close state.Message ID: @.***>

GreatWildFire commented 1 year ago

请问样例文件加载正常吗 GreatWildFire @.>于2023年4月17日 周一12:07写道: 遇到了同样的错误,采用了上述的方法后依然无法解决。请问有什么其他的建议么,还是因为我的txt数据格式调整的不对呢(有可供参考的数据格式么)?非常感谢 content/data02.txt 未能成功加载 Traceback (most recent call last): File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1302, in process_api result = await self.call_function( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, args) File "E:/LLM/langchain-ChatGLM/webui.py", line 82, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) File "E:\LLM\langchain-ChatGLM\chains\local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\base.py", line 116, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 345, in from_texts return cls.from(texts, embeddings, embedding, metadatas, **kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 307, in from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range — Reply to this email directly, view it on GitHub <#107 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLH5EX7O7QWIVOH6SR3CALXBS6XTANCNFSM6AAAAAAW7NG324 . You are receiving this because you modified the open/close state.Message ID: @.***>

您好,样例文件也无法加载。 content/langchain-ChatGLM_README.md 未能成功加载 content/state_of_the_search.txt 未能成功加载 Traceback (most recent call last): File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\routes.py", line 401, in run_predict output = await app.get_blocks().process_api( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1302, in process_api result = await self.call_function( File "D:\anaconda3\envs\langchai\lib\site-packages\gradio\blocks.py", line 1025, in call_function prediction = await anyio.to_thread.run_sync( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\anaconda3\envs\langchai\lib\site-packages\anyio_backends_asyncio.py", line 867, in run result = context.run(func, *args) File "E:/LLM/langchain-ChatGLM/webui.py", line 82, in get_vector_store vs_path = local_doc_qa.init_knowledge_vector_store(["content/" + filepath]) File "E:\LLM\langchain-ChatGLM\chains\local_doc_qa.py", line 80, in init_knowledge_vector_store vector_store = FAISS.from_documents(docs, self.embeddings) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\base.py", line 116, in from_documents return cls.from_texts(texts, embedding, metadatas=metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 345, in from_texts return cls.__from(texts, embeddings, embedding, metadatas, kwargs) File "D:\anaconda3\envs\langchai\lib\site-packages\langchain\vectorstores\faiss.py", line 307, in __from index = faiss.IndexFlatL2(len(embeddings[0])) IndexError: list index out of range

imClumsyPanda commented 1 year ago

请问有按照requirements.txt安装各依赖包吗?

GreatWildFire commented 1 year ago

请问有按照requirements.txt安装各依赖包吗?

您好,重新检查了依赖包,确认是缺少了一项重要的依赖,安装成功后运行成功。非常感谢您的帮助,您辛苦了。

huangjiaheng commented 1 year ago

请问有按照requirements.txt安装各依赖包吗?

您好,重新检查了依赖包,确认是缺少了一项重要的依赖,安装成功后运行成功。非常感谢您的帮助,您辛苦了。

请问是哪一个重要的依赖?

GreatWildFire commented 1 year ago

请问有按照requirements.txt安装各依赖包吗?

您好,重新检查了依赖包,确认是缺少了一项重要的依赖,安装成功后运行成功。非常感谢您的帮助,您辛苦了。

请问是哪一个重要的

请问有按照requirements.txt安装各依赖包吗?

您好,重新检查了依赖包,确认是缺少了一项重要的依赖,安装成功后运行成功。非常感谢您的帮助,您辛苦了。

请问是哪一个重要的依赖?

就是requirements.txt中的依赖unstructured[local-inference],之前安装时有些问题被我忽略掉了,现在重新安装了。

flaviadeutsch commented 1 year ago

安装最新版本,一样遇到这个问题。nltk_data目录下的文件都有

chi-cat commented 1 year ago

https://github.com/imClumsyPanda/langchain-ChatGLM/blob/356a69a007d1cc74355b17b4446ed97630cdc36f/chains/local_doc_qa.py#L57

https://github.com/imClumsyPanda/langchain-ChatGLM/blob/356a69a007d1cc74355b17b4446ed97630cdc36f/chains/local_doc_qa.py#L68

https://github.com/imClumsyPanda/langchain-ChatGLM/blob/356a69a007d1cc74355b17b4446ed97630cdc36f/chains/local_doc_qa.py#L77

你可以尝试将这些异常捕捉信息详细打印出来以提供更详细的错误信息