Confusion about MultipleNegativesRankingLoss

datistiquo commented 3 years ago

Hey,

I wonder if this loss usable if you have more than 1 positive example in your trainingdata. I understand the loss that it takes all other examples than the positive example as negative examples. But what if you have trainingdata like:

A1, B1 A1, B2 ....

With B1, B2 are just 2 positive examples for my query A1?

Otherwise this loss is not suitable for training where you have multiple positive examples for a query which sound a little bit not practical...? So I guess I am wrong?

Also, do I need to take care by hand of the strutcture of trainingdata? What about if the 1. half of trainingdata are just the postive examples and the last half all negative examples?

Please enlight my illusion. :)

nreimers commented 3 years ago

If there chance are low that both are in the same mini-batch, then you can use it without changes.

If there is a high chance, you can write your on Pytorch DataLoader that ensure that A1 is not twice in the same mini batch.

datistiquo commented 3 years ago

If there chance are low that both are in the same mini-batch, then you can use it without changes.

What do you mean by that?

Ok, then I am right and it seems like you have to know in detail what is going behind the scences and have to take care. I assumed normal "behaviour2 in usage like for contastive loss.

I have data (1000 example in total eg) where each positive anchor has up to 5 positive examples.

So like in the format for training contrastive loss. I have less data so I can see large influences. That is why I asked here.

It would be very nice if you could explain how I have to strcuture my data and how is it influenced by the batch size. Since it seems some issues there with no intuitive handling. do I have to do a gap between each correlated positve pairs of size of bacth size eg?

datistiquo commented 3 years ago

@nreimers A short explanation would be very nice (to above points). I can imagine that other people experience similar issues.

Is it possible that this loss might be only sutiable for really many data and or having just a single positive examples (so chances are low or do not disturb strongly learning).? Both scenarios are somehow unrealistic in real world.
I would mention this kind of behaviour definetly as a comment in the docs!

Saying

This loss expects as input a batch consisting of sentence pairs (a_1, p_1), (a_2, p_2)…, (a_n, p_n) where we assume that (a_i, p_i) are a positive pair and (a_i, p_j) for i!=j a negative pair.

does somehow imply that you dataloader takes care all of it...

I assume that it does pick the negatives just randomly from the set excluding the current positive example in the batch? So i does not look at possibly same IDs beloning to this same anchor?

datistiquo commented 3 years ago

This loss expects as input a batch consisting of sentence pairs (a_1, p_1), (a_2, p_2)…, (a_n, p_n) where we assume that (a_i, p_i) are a positive pair and (a_i, p_j) for i!=j a negative pair.

So I should take care of this by my own. In your bi-encoder example and in general you assume that by chance in your batch are no related queries?

Because I am not so aware of the handling in pytorch up to now:

Am I right that your batches are intuitevely (like in keras) are built in the order from the dataset? So when positive examples are sorted nearby (grouping related queries), chnaces are very hight that they occure in same batch but as negatives (using shuffle=False as default)? This would explains my bad results...

Thank you so much! :)

nreimers commented 3 years ago

It is recommended to use shuffle=True for the dataloader. If it is false, it keeps the original order of the dataset.

It is no issue if there are few mini-batches with multiple positives. It becomes an issue if most mini-batches have more than one positive per anchor.

It that case you can either create a custom pytorch dataset or a pytorch dataloader that loads your dataset such that there a no multiple positives in the same batch: https://pytorch.org/docs/stable/data.html

How to do this depends on your specific dataset, what type of duplicate positives you have and how to identify them.

datistiquo commented 3 years ago

Thank you.

It is no issue if there are few mini-batches with multiple positives. It becomes an issue if most mini-batches have more than one positive per anchor.

Just for understanding: If the positive example are ordered next to each other than without shuffling most (or even all) batches have the same postive examples? So learning is bad.

Here is what I mean exactly:

Anchor A1 has few positive examples P11,P12,..P1N They are ordered like the follwoing ways (trainigndata):

Anchor, Positives A0,P01 ... A0,P0N

A1, P11 A1, P12 ... A1, P1N ...

Then I can asume that is most probable that within a bacth eg P12 is used as negative for A1? Although P12 is also positive to A1.

Is this correct?

Ans using shuffling roks as usual that randomly elements are pciked (and then used negative)?

nreimers commented 3 years ago

Yes that would be an issue.

datistiquo commented 3 years ago

Yes that would be an issue.

Thank you. Do you have any example how to structure then the batch via pytorch dataloader?

datistiquo commented 3 years ago

@nreimers

It that case you can either create a custom pytorch dataset or a pytorch dataloader that loads your dataset such that there a no multiple positives in the same batch: https://pytorch.org/docs/stable/data.html

How to do this depends on your specific dataset, what type of duplicate positives you have and how to identify them.

Hey, I just take a look on this again. I struggled a little bit, because as you mentioned above first the dataset I tried to do this. But I think the dataloader is the right place to structure a custom batch? Isnt' actually the right place to this in the collate_fn fucntion where you can easily check if the class label is not the same for each batch and ignore same multiple positive examples?

UKPLab / sentence-transformers

Confusion about MultipleNegativesRankingLoss #608