Closed daegonYu closed 5 months ago
Hi,
I am not exactly sure I understand your issue, so I will briefly go over what the loss does.
def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
embeddings_a = reps[0]
embeddings_b = torch.cat(reps[1:])
scores = self.similarity_fct(embeddings_a, embeddings_b) * self.scale
# Example a[i] should match with b[i]
range_labels = torch.arange(0, scores.size(0), device=scores.device)
return self.cross_entropy_loss(scores, range_labels)
The line of code before the return statement will create a tensor of indices which corresponds to the labels. This tensor will have values starting with 0 up to the batch size. So, the scores corresponds to a $B \times B$ tensor if using pairs or $B \times 2B$ tensor if using triplets.
As you can see, MNRL actually uses a cross entropy loss under the hood. For example, if you're using pairs, this means that each anchor has a single positive example and $B - 1$ negative examples ($2B - 1$ in case of triplets) and you're trying to determine which example corresponds to the anchor. Basically, you're doing multi-class classification of an anchor compared to a single positive and a bunch of negatives. So, to answer your question, the default loss uses the negatives by default.
Hello!
@ir2718 is right: if you have positive pairs: (anchor_i, positive_i)
, then (anchor_i, positive_j)
with i != j
are all seen as negative pairs, i.e., all other positives from the other anchors are seen as negatives. In reality this is batch_size - 1
negatives per anchor.
You can also provide extra negatives yourself: (anchor_i, positive_i, negative_i)
, then (anchor_i, negative_i)
, (anchor_i, negative_j)
, and (anchor_i, positive_j)
will all be seen as negatives. In reality this is batch_size * 2 - 1
negatives per anchor.
And you can indeed add multiple negatives: (anchor_i, positive_i, negative_1_i, negative_2_i, ..., negative_n_i)
, and all positives/negatives from the other anchors are seen as negatives. This means batch_size * (num_negative_columns + 1) - 1
negatives.
An extreme example that you can use yourself is https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/viewer/triplet-50, which uses a whopping 50 negatives. A common example is 1 negative: https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3/viewer/triplet
But you can also use e.g. 3 negatives. The only limitation is that it must be the same amount of negatives per sample.
Thank you both for your replies. From what I understand, you can use multiple negatives as well as in-batch negatives loss. Also, I understood that to use multiple negatives, the number of negatives must all be the same. Is this right?
That is right. This is because datasets
works in columns, and each column must have just text strings. So, you can't create a different number of columns for each sample.
Thank you for your reply. It was very helpful.
In the MultipleNegativesRankingLoss loss function, the number of positives is 1 and I want to set the number of negatives to multiple. Is there a way to do this?