UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.26k stars 2.47k forks source link

MultipleNegativesRankingLoss Multiple negatives in the loss function #2693

Closed daegonYu closed 5 months ago

daegonYu commented 5 months ago

In the MultipleNegativesRankingLoss loss function, the number of positives is 1 and I want to set the number of negatives to multiple. Is there a way to do this?

ir2718 commented 5 months ago

Hi,

I am not exactly sure I understand your issue, so I will briefly go over what the loss does.

    def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
        reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
        embeddings_a = reps[0]
        embeddings_b = torch.cat(reps[1:])

        scores = self.similarity_fct(embeddings_a, embeddings_b) * self.scale
        # Example a[i] should match with b[i]
        range_labels = torch.arange(0, scores.size(0), device=scores.device)
        return self.cross_entropy_loss(scores, range_labels)

The line of code before the return statement will create a tensor of indices which corresponds to the labels. This tensor will have values starting with 0 up to the batch size. So, the scores corresponds to a $B \times B$ tensor if using pairs or $B \times 2B$ tensor if using triplets.

As you can see, MNRL actually uses a cross entropy loss under the hood. For example, if you're using pairs, this means that each anchor has a single positive example and $B - 1$ negative examples ($2B - 1$ in case of triplets) and you're trying to determine which example corresponds to the anchor. Basically, you're doing multi-class classification of an anchor compared to a single positive and a bunch of negatives. So, to answer your question, the default loss uses the negatives by default.

tomaarsen commented 5 months ago

Hello!

@ir2718 is right: if you have positive pairs: (anchor_i, positive_i), then (anchor_i, positive_j) with i != j are all seen as negative pairs, i.e., all other positives from the other anchors are seen as negatives. In reality this is batch_size - 1 negatives per anchor.

You can also provide extra negatives yourself: (anchor_i, positive_i, negative_i), then (anchor_i, negative_i), (anchor_i, negative_j), and (anchor_i, positive_j) will all be seen as negatives. In reality this is batch_size * 2 - 1 negatives per anchor. And you can indeed add multiple negatives: (anchor_i, positive_i, negative_1_i, negative_2_i, ..., negative_n_i), and all positives/negatives from the other anchors are seen as negatives. This means batch_size * (num_negative_columns + 1) - 1 negatives.

An extreme example that you can use yourself is https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/viewer/triplet-50, which uses a whopping 50 negatives. A common example is 1 negative: https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/viewer/triplet

But you can also use e.g. 3 negatives. The only limitation is that it must be the same amount of negatives per sample.

daegonYu commented 5 months ago

Thank you both for your replies. From what I understand, you can use multiple negatives as well as in-batch negatives loss. Also, I understood that to use multiple negatives, the number of negatives must all be the same. Is this right?

tomaarsen commented 5 months ago

That is right. This is because datasets works in columns, and each column must have just text strings. So, you can't create a different number of columns for each sample.

daegonYu commented 5 months ago

Thank you for your reply. It was very helpful.