无法加载JinaAI Embedding模型

Aikoin commented 8 months ago

问题描述 / Problem Description 下载了jina-embeddings-v2-base-zh嵌入模型，也更新了model.config：

EMBEDDING_MODEL = "jina-embeddings-v2-base-zh"
MODEL_PATH = {
    "embed_model": {
        "jina-embeddings-v2-base-zh": "jinaai/jina-embeddings-v2-base-zh",

复现问题的步骤 / Steps to Reproduce 重建知识库时出现问题（运行python init_database.py --recreate-vs）在Langchain-Chatchat/server/knowledge_base/kb_cache/base.py中加载embedding模型部分的代码：

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
                    embeddings = HuggingFaceEmbeddings(model_name=get_model_path(model),
                                                       model_kwargs={'device': device})

预期的结果 / Expected Result 正确向量化知识库里的文档

实际结果 / Actual Result 报错：

~/workspace/Langchain-Chatchat$ python init_database.py --recreate-vs recreating all vector stores 2024-03-12 14:47:07,321 - faiss_cache.py[line:92] - INFO: loading vector store in 'samples/vector_store/jina-embeddings-v2-base-zh' from disk. 2024-03-12 14:47:07,477 - SentenceTransformer.py[line:66] - INFO: Load pretrained SentenceTransformer: jina-embeddings-v2-base-zh Could not locate the configuration_bert.py inside jinaai/jina-bert-implementation. 2024-03-12 14:47:17,496 - embeddings_api.py[line:39] - ERROR: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like jinaai/jina-bert-implementation is not the path to a directory containing a file named configuration_bert.py. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. AttributeError: 'NoneType' object has no attribute 'conjugate'

环境信息 / Environment Information

使用的模型（ChatGLM2-6B / Qwen-7B 等）： Qwen-7B
使用的 Embedding 模型（moka-ai/m3e-base 等）：JinaAI embedding base zh
使用的向量库类型 (faiss / milvus / pg_vector 等)： faiss

附加信息 / Additional Information 在网上搜了，其他人也有这个问题，但我用网上的方法没有解决。也试过了AutoModel.from_pretrained()等等方式加载，包括加上trust_remote_code=True等等，但是都不行。

dosubot[bot] commented 8 months ago

Based on the error message you provided, it seems like the issue is related to the loading of the JinaAI Embedding model. The error message indicates that the model configuration file configuration_bert.py could not be found in the specified path jinaai/jina-bert-implementation. This could be due to a few reasons:

Incorrect Model Path: The path specified in MODEL_PATH might be incorrect. Please ensure that the path jinaai/jina-embeddings-v2-base-zh is correct and the model files are present in this directory.
Internet Connection: The error message suggests that there might be an issue with the internet connection while trying to load the model from Hugging Face's model hub. Please ensure that you have a stable internet connection.
Offline Mode: If you are running the library in offline mode, you need to ensure that the model files are already downloaded and available locally. You can refer to the Hugging Face's offline mode documentation for more details.
Model Compatibility: The model jina-embeddings-v2-base-zh might not be compatible with the version of the library you are using. Please ensure that the model is compatible with your current library version.

As for the AttributeError: 'NoneType' object has no attribute 'conjugate' error, it seems like the embeddings object is None at the time of calling the conjugate method. This could be due to the model not being loaded correctly due to the issues mentioned above.

Regarding the similar issues in the Langchain-Chatchat repository, it seems like other users have faced similar issues while loading the model. The solutions provided in these issues might be helpful for you:

Issue #3208 suggests checking the initialization of the embedding model, reviewing the normalize function, verifying the compatibility of the embedding model, updating the dependencies, debugging the initialization parameters, and consulting the documentation.
Issue #3061 suggests checking the normalize function, ensuring the sentence-transformers package is correctly installed, manually downloading the bge-large-zh model, modifying the model_config.py file, checking the network connection, considering downgrading the langchain-community package, and checking if all necessary files are present.
Issue #1999 suggests modifying the source code of the dependency fschat to include the trust_remote_code=True parameter.
Issue #278 suggests checking the model reference in Hugging Face.

I hope this information helps you in resolving the issue. If you have any more questions or need further clarification, feel free to ask.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

zRzRzRzRzRzRzR commented 8 months ago

这似乎不在支持列表内

Aikoin commented 8 months ago

这似乎不在支持列表内

我现在解决啦，除了模型文件还需要下载配置文件，升级sentence-transformers到最新版本，在 ./server/knowledge_base/kb_cache/base.py Line 151 添加：

elif 'bge-' in model:
    ...
############### 添加代码段 ###############
elif 'jina-' in model:
    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
    embeddings = HuggingFaceEmbeddings(model_name=get_model_path(model),
                                       model_kwargs={'device': device, 'trust_remote_code': True})
############### 添加代码段 ###############
else:   
    ...

sevenandseven commented 6 months ago

这似乎不在支持列表内

我现在解决啦，除了模型文件还需要下载配置文件，升级sentence-transformers到最新版本，在 ./server/knowledge_base/kb_cache/base.py Line 151 添加：
elif 'bge-' in model:
    ...
############### 添加代码段 ###############
elif 'jina-' in model:
    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
    embeddings = HuggingFaceEmbeddings(model_name=get_model_path(model),
                                       model_kwargs={'device': device, 'trust_remote_code': True})
############### 添加代码段 ###############
else:   
    ...   

你好，请问它对应的reranker模型应该怎么加载？

anmao commented 5 months ago

这似乎不在支持列表内

我现在解决啦，除了模型文件还需要下载配置文件，升级sentence-transformers到最新版本，在 ./server/knowledge_base/kb_cache/base.py Line 151 添加：
elif 'bge-' in model:
    ...
############### 添加代码段 ###############
elif 'jina-' in model:
    from langchain.embeddings.huggingface import HuggingFaceEmbeddings
    embeddings = HuggingFaceEmbeddings(model_name=get_model_path(model),
                                       model_kwargs={'device': device, 'trust_remote_code': True})
############### 添加代码段 ###############
else:   
    ...   
你好，请问它对应的reranker模型应该怎么加载？

修改模型的config.json文件，里面的auto_map字段是用来存储所依赖的模型的信息的。将字符串中“--”前的内容修改为模型在本地的路径，例如将“BAAI/bge-reranker-v2-minicpm-layerwise”修改为“/root/models/reranker/bge-reranker-v2-minicpm-layerwise”

RhonLiu commented 3 months ago

你们真是善呐，卡了我一周的bug，终于解决了，还是好人多

chatchat-space / Langchain-Chatchat

无法加载JinaAI Embedding模型 #3278

Sources