UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.51k stars 2.4k forks source link

ContrastiveLoss with hard negatives? #900

Open PaulForInvent opened 3 years ago

PaulForInvent commented 3 years ago

Hey,

is it possible to use hard negatives with this loss? I think for OnlineContrastive and the BatchHard Losses you cannot and need not, since they are computed automatcially, right?

Beside MultipleNegativeRanking Loss which Losses can use hard negatives?

nreimers commented 3 years ago

For ConstrativeLoss you define the pairs yourself. There you can simple pass hard negatives.

Otherwise, TripletLoss also allows to work with hard negatives.

PaulForInvent commented 3 years ago

@nreimers thanks.

For ConstrativeLoss you define the pairs yourself. There you can simple pass hard negatives.

How do you hand over those? Is there is difference between having hard negatives somewhere in my trainingdata or to handover them somehow special?

And for OnlineContrastive, possible to include manually hard ones? Actually, for this loss, do I have to take care, if there are multiple positive samples of one class in my batch (like for multiplengetaiveranking)?

nreimers commented 3 years ago

For ConstrativeLoss / ConstrativeLoss you pass pairs, i.e.:

train_samples = [InputExample(texts=['This is a ', 'positive pair'], label=1), InputExample(texts=['This is a ', 'negative pair'], label=0)

You need to create these pairs by yourself. There you can choose any strategy you like.

PaulForInvent commented 3 years ago

For ConstrativeLoss / ConstrativeLoss you pass pairs, i.e.:

train_samples = [InputExample(texts=['This is a ', 'positive pair'], label=1), InputExample(texts=['This is a ', 'negative pair'], label=0)

You need to create these pairs by yourself. There you can choose any strategy you like.

Thanks, and this could actually be true for the OnlineContrastiveloss, right?

And since the online version is about batches, I feel again multiple positive samples of same class can lead to confusion? Does they count as negative samples?

PaulForInvent commented 3 years ago

@nreimers

According the

    You can also provide one or multiple hard negatives per anchor-positive pair by structering the data like this:
    (a_1, p_1, n_1), (a_2, p_2, n_2)

How would I do that for multiple hard negatives? As a list? Doesnt play indexing a role, so just using

(a_1, p_1, n_1), (a_1, p_1, n_2)

is maybe not a good idea?

nreimers commented 3 years ago

For MultipleNegativesRankingLoss you can provide multiple negatives in the same tuple. You just must make sure that all tuples have the same number of texts.

Otherwise what you showed is suitable for TripletLoss. There you provide triplets in the format you posted

PaulForInvent commented 3 years ago

For MultipleNegativesRankingLoss you can provide multiple negatives in the same tuple. You just must make sure that all tuples have the same number of texts.

Otherwise what you showed is suitable for TripletLoss. There you provide triplets in the format you posted

? Sorry, but what is the syntax?

I have it from the MultipleNegativesRankingLoss, so ...

https://github.com/UKPLab/sentence-transformers/blob/0799eb3a5b9ccd5e3d024b4991a68321b98c6c8a/sentence_transformers/losses/MultipleNegativesRankingLoss.py#L23

PaulForInvent commented 3 years ago

For MultipleNegativesRankingLoss you can provide multiple negatives in the same tuple. You just must make sure that all tuples have the same number of texts. Otherwise what you showed is suitable for TripletLoss. There you provide triplets in the format you posted

? Sorry, but what is the syntax?

@nreimers ;)

PaulForInvent commented 3 years ago

@nreimers Right now I also thought of something. Do you have any experience about using hard negative along with losses like the online BatchTripletts? I wold guess if you have explicit hard negatives within a batch, this should have some learning enhancement?

With these online losses you have implemented you cannot give explicit hard negatives? So right now, if hard negatives are in training data there are by chance within the same batch?

nreimers commented 3 years ago

@PaulForInvent I did not work with the OnlineConstrativeLoss examples too much, it was contributed by someone else. So sadly no experiences in how to add hard negatives to it.

PaulForInvent commented 3 years ago

@nreimers but I think OnlineContrastive we already talked about hat you have to apply pairs by yourself... so nothing special. But I meant the batch triplet losses? ;) Maybe you can give a statement to my last post. :)

nreimers commented 3 years ago

The BatchTripletLoss uses the label information you have. Would be hard to include hard negatives there, but you could try to extend it.

PaulForInvent commented 3 years ago

@nreimers Yes, I thought that you would add explicitly hard negatives per batch? So as this loss is calculating the loss per batch, right? So you would need to add them per hand by class label...

Thats why I said by chance in the standard implementation?

Also, do you think that it will enhance learning if you present hard ones per batch too?

PS: Maybe I post this in another thread. I suppose it is really a bit magic working and training with samples and especially adding hard negative one. You have myriad ways and possibilities to take eg. different hard negatives. Then if you choose some hard negatives you have possibly an issue that a bias to these sub-semantic-differences is trained...Might be also a new part of research: how and what examples I need to get my desired results? At the moment it a big black box, without knowing how and why the model conclcudes its result. Eg where is possible a bias and wrong correlation in my semantic space and what example I need to pull the sub semnatic in the right way... Sounds futuristic at the moment. xD

PaulForInvent commented 3 years ago

@nreimers

So using OnlineContrastive Loss with hard negatives as negative examples should be more accurate than using random (and thus simple) negative examples?

Do you know which of all the online triplet losses (semi, hard etc. ) is more suitable for a semantic search task? All of these give quite same results, but the batchhard one is slightly better...

The BatchTripletLoss uses the label information you have. Would be hard to include hard negatives there, but you could try to extend it.

Then using the coresponding label class (which is a hard negative class) inside the same batch?

nreimers commented 3 years ago

Hi, yes, hard negatives improve the results.

The different online triplet loss functions are quite similar => similar results

Have not looked to much into these loss function. Cannot say how to include hard negatives

PaulForInvent commented 3 years ago

@nreimers OnlineContarstive Loss is also very much (factor >10) slower than the Batch and MultiRanking losses...?

PaulForInvent commented 3 years ago

@nreimers any idea to this speed difference?

nreimers commented 3 years ago

Not to familiar with the loss anymore. But I think the generation of the triplets on the fly takes some time. Maybe this can be optimized?

But yes, factor 10 is larger than expected.

PaulForInvent commented 3 years ago

But yes, factor 10 is larger than expected.

it was more pi times daumen, but I will check it more carefully.

PaulForInvent commented 3 years ago

@nreimers according to the new chnaged underlying transformers library. Is it possible that there where some major chnages lately, because I get some new inconsitent and strange results even if I just load my previous trained model (trained before the library changes). Maybe there are some new things I am not aware off?

nreimers commented 3 years ago

Old models should still yield the same embeddings. There were not changes on that.

So some more information would be helpful.

PaulForInvent commented 3 years ago

Thnaks, but I was wrong. Thank's so much anyway!

PaulForInvent commented 3 years ago

@nreimers I just want to say: Thank you!

I am really excited esp. with the large progress with unsupervised learning. I planned to do that too, now you offer this possibility very easy. Nonetheless you implemented also many other unsupervised techniques not only the common MLM!

👍 👍 😃