Training Loss Unchanged During Fine-Tuning when Varying Negative Pairs

doramasma commented 1 month ago

Hello! I'm experiencing an issue or unexpected behavior during the fine-tuning process where the training loss remains unchanged, despite varying the number of negative pairs in the dataset used for fine-tuning. Below are the details:

I've been performing different fine-tuning experiments using datasets composed of queries, where each query has 1 positive pair (the positive pair associated with the query remains the same across the three datasets for a given query). The difference between the datasets is the varying number of negative pairs. I have created three datasets:

Dataset 1: 1 negative pair per query
Dataset 2: 4 negative pairs per query
Dataset 3: 10 negative pairs per query

Expected Behavior: I expected the training loss to change as the number of negative pairs increases in the datasets. I'm not saying that should be better or worse, just change the loss, as the model should be trained with different information, right?

Observed Behavior: The training loss does not exhibit any noticeable change despite the variation in the number of negative pairs. As you can see in the following image, I've integrated the logs with Weights and Biases, and the loss appears to be the same.

Any ideas or advice? Do I need to activate a flag or configure something special to use the list of pairs during the fine-tuning?

{
  "query": "string",
  "pos": ["list of positive strings"],
  "neg": ["list of negative strings"]
}

Thanks in advance!

staoxiao commented 1 month ago

@doramasma , do you change the hyper-parameter --train_group_size (which can control the number of used negatives)?

doramasma commented 1 month ago

Hello! Thanks for answering.

I believe that this hyper-parameter was the problem. I was following the example guide for the unified_finetune, and it was not explained there. Now checking the definition I have another doubt:

train_group_size: the number of positive and negatives for a query in training. There are always one positive, so this argument will control the number of negatives (#negatives=train_group_size-1). Noted that the number of negatives should not be larger than the numbers of negatives in data "neg":List[str]. Besides the negatives in this group, the in-batch negatives also will be used in fine-tuning.

It says, "There is always one positive, so this argument will control the number of negatives." So, what happens if I want to have 5 positive and 5 negative pairs? What would the values of the parameters be?

Thanks in advance!

cc: @staoxiao

staoxiao commented 1 month ago

@doramasma , during the training, we will sample one positive example from the list 'pos', and then sample train_group_size - 1 negatives from list 'neg'. So, if you have 5 positive and 5 negative pairs, you can set train_group_size <= 6.

doramasma commented 1 month ago

Thank you so much!

FlagOpen / FlagEmbedding

Training Loss Unchanged During Fine-Tuning when Varying Negative Pairs #958