FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.04k stars 514 forks source link

BGE-M3 模型加载失败 #502

Open Xls1994 opened 7 months ago

Xls1994 commented 7 months ago

调用BGE的M3模型时候,官方的示例代码,出现了下面这个问题。看起来像是transformer版本问题导致的。

sentence-transformers 2.2.2 transformers 4.34.1 FlagEmbedding 1.2.5

Traceback (most recent call last):
  File "bge_m3_embedding_model.py", line 7, in <module>
    model = BGEM3FlagModel(BGE_M3_PATH,
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/FlagEmbedding/bge_m3.py", line 36, in __init__
    self.model = BGEM3ForInference(
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/FlagEmbedding/BGE_M3/modeling.py", line 40, in __init__
    self.load_model(model_name, colbert_dim=colbert_dim)
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/FlagEmbedding/BGE_M3/modeling.py", line 77, in load_model
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 751, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
    return cls._from_pretrained(
  File "/home/deploy/anaconda3/envs/yyl_env_py388/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2242, in _from_pretrained
    init_kwargs[key] = added_tokens_map.get(init_kwargs[key], init_kwargs[key])
TypeError: unhashable type: 'dict'

代码

from FlagEmbedding import BGEM3FlagModel

model = BGEM3FlagModel(BGE_M3_PATH,
                       use_fp16=True)  # Setting use_fp16 to True speeds up computation with a slight performance degradation

sentences_1 = ["What is BGE M3?", "Defination of BM25"]
sentences_2 = [
    "BGE M3 is an embedding model supporting dense retrieval, lexical matching and multi-vector interaction.",
    "BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document"]

output_1 = model.encode(sentences_1, return_dense=True, return_sparse=True, return_colbert_vecs=False)
output_2 = model.encode(sentences_2, return_dense=True, return_sparse=True, return_colbert_vecs=False)

# you can see the weight for each token:
print(model.convert_id_to_token(output_1['lexical_weights']))
staoxiao commented 7 months ago

是的,可以提高transformers版本,比如4.37

Xls1994 commented 7 months ago

这个有没有推荐的版本呢,随便升级可能会导致我们其他的环境出问题

JiwenZ commented 6 months ago

是的,可以提高transformers版本,比如4.37

升到4.37.0和4.38.2又报了这个错误

`File /root/anaconda3/envs/rerank-rom/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:111, in PreTrainedTokenizerFast.init(self, *args, **kwargs) 108 fast_tokenizer = copy.deepcopy(tokenizer_object) 109 elif fast_tokenizer_file is not None and not from_slow: 110 # We have a serialization from tokenizers which let us directly build the backend --> 111 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1`

hanhainebula commented 6 months ago

是的,可以提高transformers版本,比如4.37

升到4.37.0和4.38.2又报了这个错误

`File /root/anaconda3/envs/rerank-rom/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py:111, in PreTrainedTokenizerFast.init(self, *args, **kwargs) 108 fast_tokenizer = copy.deepcopy(tokenizer_object) 109 elif fast_tokenizer_file is not None and not from_slow: 110 # We have a serialization from tokenizers which let us directly build the backend --> 111 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

Exception: expected value at line 1 column 1`

我这边测试 transformers 版本 4.37.0 没有遇到这个问题。