FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs
MIT License
7.51k stars 540 forks source link

BGE 模型微调后,有关tokenizer的报错 #182

Open linchen111 opened 1 year ago

linchen111 commented 1 year ago

Exception Traceback (most recent call last) /data_lc/LLM/Sentence_transformers/test.ipynb Cell 1 line 2 1 from FlagEmbedding import FlagModel ----> 2 model = FlagModel('Models/bge-large-zh-v1.5-finetune/',query_instruction_for_retrieval="Represent this sentence for searching relevant passages", 3 use_fp16=False) #设置 fp16 为True可以加快推理速度,效果会有可以忽略的下降

File /data_lc/envs/sbert/lib/python3.8/site-packages/FlagEmbedding/flag_models.py:19, in FlagModel.init(self, model_name_or_path, pooling_method, normalize_embeddings, query_instruction_for_retrieval, use_fp16) 10 def init( 11 self, 12 model_name_or_path: str = None, (...) 16 use_fp16: bool=True 17 ) -> None: ---> 19 self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) 20 self.model = AutoModel.from_pretrained(model_name_or_path) 21 self.query_instruction_for_retrieval = query_instruction_for_retrieval

File /data_lc/envs/sbert/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:751, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, *kwargs) 747 if tokenizer_class is None: 748 raise ValueError( 749 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 750 ) --> 751 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, **kwargs) 753 # Otherwise we have to be creative. ... 112 elif slow_tokenizer is not None: 113 # We need to convert a slow tokenizer to build the backend 114 fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)

staoxiao commented 1 year ago

正常跑完的话,会自动存下tokenizer,可能是没正常执行完。可以从原始bge文件下复制一份token的文件过去