[BUG] 无法使用zhipu的在线embedding

Chal1ce commented 10 months ago

无法使用zhipu的api进行在线embedding LLM聊天正常，但是知识库导入文件时无法进行向量化，显示无法转化绝对路径到相对路径

复现问题的步骤

将model_config.py中的嵌入模型改为zhipu-api
清空knowledge_base文件夹并重建向量库
python startup.py -a --lite 轻量部署启动
新建数据库 -> 导入文件 -> 无法生成向量文件

预期的结果 / Expected Result 预期的结果应该是向量库里既有源文件也有向量文件

实际结果 / Actual Result 实际结果：只有源文件，没有向量文件报错信息如下：

环境信息 / Environment Information

langchain-ChatGLM 版本：0.0.344
是否使用 Docker 部署（是/否）：否
使用的模型：zhipu-api
使用的 Embedding 模型：zhipu-api
使用的向量库类型： faiss
操作系统及版本: Ubuntu 22.04 LTS
Python 版本: 3.11.5

dosubot[bot] commented 10 months ago

🤖

你好，@Chal1ce！我是Dosu，一个可以帮助你解决问题、回答问题，甚至帮助你成为贡献者的机器人。在等待人类维护者的过程中，我将尽我所能为你提供帮助。希望我们能愉快地合作！

您无法在导入知识库文件时使用zhipu API进行在线嵌入，这是因为在server/knowledge_base/kb_doc_api.py文件中的recreate_vector_store函数没有参数来指定使用zhipu API进行在线嵌入。

在函数定义中：

def recreate_vector_store(
        knowledge_base_name: str = Body(..., examples=["samples"]),
        allow_empty_kb: bool = Body(True),
        vs_type: str = Body(DEFAULT_VS_TYPE),
        embed_model: str = Body(EMBEDDING_MODEL),
        chunk_size: int = Body(CHUNK_SIZE, description="知识库中单段文本最大长度"),
        chunk_overlap: int = Body(OVERLAP_SIZE, description="知识库中相邻文本重合长度"),
        zh_title_enhance: bool = Body(ZH_TITLE_ENHANCE, description="是否开启中文标题加强"),
        not_refresh_vs_cache: bool = Body(False, description="暂不保存向量库（用于FAISS）"),
):

embed_model参数默认设置为EMBEDDING_MODEL，并且没有选项可以指定zhipu API进行在线嵌入。因此，除非将EMBEDDING_MODEL设置为使用zhipu API，否则您将无法在导入知识库文件时使用它进行在线嵌入。

关于recreate_vector_store函数在Langchain-Chatchat的最新版本（v0.2.8）与0.0.344版本之间的变化，我在仓库中没有找到答案，可能需要在其他地方找到答案，或者我可能遗漏了它。

如果您需要更多的帮助，或者有其他问题，欢迎随时向我提问。

Sources

server/knowledge_base/kb_doc_api.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Chal1ce commented 10 months ago

@dosu-bot EMBEDDING_MODEL 是从config传过来的，在config我已经设置为使用zhipu api

dosubot[bot] commented 10 months ago

🤖

Based on the context provided, it seems like the issue might be related to the configuration of the embedding model or the location of the vector file. Here are a few steps you can take to resolve the issue:

Check the EMBEDDING_MODEL in your configuration. It should be set to "bge-large-zh" as per the provided context. If it's not, change it to "bge-large-zh". You can refer to the model_config.py.example for more details.
Verify the location of the vector file. The MODEL_ROOT_PATH should be set to the directory where all your Embedding and LLM models are stored. If it's not set, or if it's set to an incorrect location, the program might not be able to find the vector file.
Make sure the vector file is in the correct location. According to the MODEL_PATH dictionary, the "bge-large-zh" model should be located at "BAAI/bge-large-zh" relative to the MODEL_ROOT_PATH. If it's not there, move it to the correct location.
If the vector file is in the correct location and the MODEL_ROOT_PATH is set correctly, but the program still can't find the file, it might be an issue with the permissions. Make sure the program has read access to the vector file and its directory.

If you've checked all these things and the issue still persists, it might be a bug in the program or an issue with your environment.

As for the recreate_vector_store function in the server/knowledge_base/kb_doc_api.py file, it is used to recreate the vector store from the content. This function is useful when a user can copy files to the content folder directly instead of uploading them through the network. You can refer to the kb_doc_api.py for more details.

I also found a similar issue in the Langchain-Chatchat repository: 报错 raise ValueError(f"Knowledge base {knowledge_base_id} not found") ValueError: Knowledge base ./vector_store not found. The suggested solution was to update to the latest code. You might want to try that as well.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

xiqukuwa commented 10 months ago

@dosu-bot EMBEDDING_MODEL 是从config传过来的，在config我已经设置为使用zhipu api

请问解决了吗

Chal1ce commented 10 months ago

@dosu-bot EMBEDDING_MODEL 是从config传过来的，在config我已经设置为使用zhipu api

请问解决了吗

还没有，现在暂时放弃用线上嵌入模型了，看下这边有无大神遇见过这个问题

dosubot[bot] commented 10 months ago

It seems like this issue requires a deeper look, @imClumsyPanda could you please lend your expertise?

zRzRzRzRzRzRzR commented 10 months ago

3.x开始会全部重写接入模型，尽情期待

xiangjieli96 commented 5 months ago

请问用api调用chatglm4中embedding error为什么还是出现了

chatchat-space / Langchain-Chatchat

[BUG] 无法使用zhipu的在线embedding #2465

Sources

Sources