You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

MIT License

6.7k stars 480 forks source link

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference #477

Open wangszoo opened 6 months ago

wangszoo commented 6 months ago

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at /bge/FlagEmbedding/examples/reranker/sft_model/0221-4/merged and are newly initialized: ['classifier.out_proj.bias', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Data process: 0%| | 0/58 [00:00<?, ?it/s] Data process: 2%|▏ | 1/58 [00:01<01:14, 1.31s/it] Data process: 3%|▎ | 2/58 [00:01<00:36, 1.56it/s] Data process: 5%|▌ | 3/58 [00:01<00:23, 2.32it/s] Data process: 7%|▋ | 4/58 [00:01<00:18, 3.00it/s] Data process: 9%|▊ | 5/58 [00:01<00:13, 3.91it/s] Data process: 10%|█ | 6/58 [00:02<00:10, 4.76it/s] Data process: 12%|█▏ | 7/58 [00:02<00:10, 5.04it/s] Data process: 14%|█▍ | 8/58 [00:02<00:12, 4.02it/s] Data process: 16%|█▌ | 9/58 [00:03<00:15, 3.14it/s]

你好推理的时候碰到这个问题是什么原因呢

staoxiao commented 6 months ago

transformers版本较高时，之前的微调代码的存储会出现问题，没有存储下来分类头。目前已经修复了https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py#L60，建议使用新代码进行微调。

wangszoo commented 6 months ago

感谢大佬回复，但是我这里使用的就是最新的代码

def save_pretrained(self, output_dir: str):
    state_dict = self.hf_model.state_dict()
    state_dict = type(state_dict)(
        {k: v.clone().cpu()
         for k,
         v in state_dict.items()})
    self.hf_model.save_pretrained(output_dir, state_dict=state_dict)

另外transformers 版本 4.33.0

wangszoo commented 6 months ago

我重新测试了一下，好像是因为mix models导致的，下面是我的代码

from LM_Cocktail import mix_models, mix_models_with_data

# Mix fine-tuned model and base model; then save it to output_path: ./mixed_model_1
model = mix_models(
    model_names_or_paths=["/models/bge-reranker-base", "/sft_model/0221-4/checkpoint-150"], 
    model_type='encoder', 
    weights=[0.5, 0.5],  # you can change the weights to get a better trade-off.
    output_path='/bge/FlagEmbedding/examples/reranker/sft_model/0221-4/merged-150')

staoxiao commented 6 months ago

明白了。模型融合时需要设置model_type='reranker'，否则会导致参数加载不完整。另外，模型融合不是必须的操作。

sevenandseven commented 3 months ago

明白了。模型融合时需要设置model_type='reranker'，否则会导致参数加载不完整。另外，模型融合不是必须的操作。

想请问下，如果微调普通的reranker模型，之后应该怎么进行相似度计算。我微调之后的路径是这样的。

![Uploading 2.png…]()

staoxiao commented 3 months ago

想请问下，如果微调普通的reranker模型，之后应该怎么进行相似度计算。我微调之后的路径是这样的。

Sorry, I don't understand your question. What is 普通的reranker模型? The picture also has not been uploaded successfully.

sevenandseven commented 3 months ago

想请问下，如果微调普通的reranker模型，之后应该怎么进行相似度计算。我微调之后的路径是这样的。

Sorry, I don't understand your question. What is 普通的reranker模型? The picture also has not been uploaded successfully.

The issue has been resolved; thank you for your reply. By "my regular reranker model," I mean those models that are not based on LLMs.