BGE-M3的预训练问题——loss产生偶尔上升的情况

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

MIT License

7.58k stars 551 forks source link

BGE-M3的预训练问题——loss产生偶尔上升的情况 #799

Open LLLiHaotian opened 6 months ago

LLLiHaotian commented 6 months ago

请问这种loss产生偶尔上升的情况是否正常，又该如何判断预训练合适结束？

bge-m3-patent-retromae_batch56_max350.log

staoxiao commented 6 months ago

@LLLiHaotian , you need to fine-tune the model on your downstream data, and select the best pretrain ckpt based on the downstream performance.

LLLiHaotian commented 6 months ago

I only hope to use the encoder part to support representation, and there is no need for downstream tasks for the time being. Therefore, I would like to know how to determine which ckpt has the best effect. Looking forward to your answers.

staoxiao commented 5 months ago

There is no appropriate metric to evaluate the performance of pre-training task. We recommend selecting the ckpt based on the performance of fine-tuning downstream task.

friendshipity commented 1 month ago

There is no appropriate metric to evaluate the performance of pre-training task. We recommend selecting the ckpt based on the performance of fine-tuning downstream task.

After pretraining on my task-specific training dataset, what type of data should I use for fine-tuning on the downstream retrieval task? I'm unsure whether to utilize my own downstream data(similar sentence, not too many) for fine-tuning, or to combine a large mount of public STS/retrieval dataset with my own downstream data?