Closed jhyeom1545 closed 5 months ago
@jhyeom1545 Hi, I want to ask you that: when you using the score of m3-reranker for finetune ,Do you normalize the scores of the reranker to (0,1) before finetune?
@jhyeom1545 , It could be that there is noise in your data, i.e. wrong positive and negative samples. You can try to filter the training data.
@ngothanhnam0910 , the normalized scores are not appropriate for fine-tuning because the distribution is too smoothed after softmax. You should use the scores before normalizing.
@ngothanhnam0910 Hi, I'm using data not normalized.
bellow this comment, I received an answer that I should use data that is not normalized. https://github.com/FlagOpen/FlagEmbedding/issues/701
@staoxiao Hi, Thanks for your comment.
In order to remove noise from the data, we use data with a score of 2 or more in the reranker as positive data.
I use hn-mine from 35 to 60 for negative data through baai/bge-m3 model.
In the case of my fine-tuning model, it seems that negative is being learned, but the similarity drops to minus.
Is there anything else I can try to increase positive similarity?
I'm thinking of a strategy for step 2. I'm wondering if this will be useful. Step 1. Learn with train_group_size=8 to train the negative well (my fine-tuning model at this time, positive similarity score drops)
Step 2. Fine-tuned learning with train_group_size=1 to increase positive scores, 0 negative.(query, pos data is same as step 1, just not using neg data)
@jhyeom1545 , I guess the reason is that the negative samples are too challenging, so that the model has to reduce the scores. You can use a larger sample range (e.g., 35-60 -> 1-300).
Besides, lower scores of positive samples may not necessarily affect the ranking accuracy. For downstream tasks, such as passage retrieval or semantic similarity, what matters is the relative order of the scores, not the absolute value.
I am fine-tuning the m3-base or m3-base-unsupervised. I have a question about the fine-tuning result.
I'm fine-tuning using the format of Toy Data in Unified Fine-tuning. I'm using about 200,000+ pieces of data.
However, after fine-tuning, the neg score improved (neg passage and query similarity decreased) Even after fine-tuning, the pos score did not improve, but rather decreased (the similarity between pos passage and query decreases). Can you tell me why these results are coming out?
We looked at the average similarity of 100 same queries The baai/bge-m3 model had a sim of about 0.6 and was lowered to 0.5 or 0.4 after fine-tuning.
In order to increase similarity, I tried to learn by attaching instruction, but the results were similar.
Is there a way for me to improve the similarity?
I'm applying knowledge distillation using the score of m3-reranker, and fine-tuning with the template as below.