UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.98k stars 2.45k forks source link

Sentence Transformer encodings #1168

Open Gchand0249 opened 3 years ago

Gchand0249 commented 3 years ago

Hi ,

We finetuned Sentence transformers on our domain specific data (similar to NLI data). It is giving high cosine score for irrelevant suggestions . We used good , bad , ok while labeling the data.

nreimers commented 3 years ago

Adding a question to your issue would be quite helpful.

Gchand0249 commented 3 years ago

yes , After extracting embeddings from sbert we using cosine score for sorting results. Issue here is results with high cosine score are irrelevant . And similar results are getting less score. We are unable to figure it out why it is happening

nreimers commented 3 years ago

Likely due to wrongly training the model.

Gchand0249 commented 3 years ago

Thank you , Does performance depends on batch size ?

Could you please elaborate what does it mean by wrongly training ? Epoch, batch size or Data perspective or Loss

We trained for 4 epochs with batch size 16 and used SoftmaxLoss .

nreimers commented 3 years ago

SoftmaxLoss is the wrong loss. Have a look at the other losses functions

Gchand0249 commented 3 years ago

Thanks for your replay , Could you please suggest me preferable loss to train sbert ?

nreimers commented 3 years ago

MultipleNegativesRankingLoss or one of the triplet losses

Gchand0249 commented 3 years ago

Thank you @nreimers ,

Could you please explain why SoftmaxLoss is the wrong loss ? In the sbert website you mentioned that you used softmax loss for training sbert on NLI data and our data labels are similar to NLI data.

Gchand0249 commented 3 years ago

Thank you @nreimers ,

Could you please explain why SoftmaxLoss is the wrong loss ? In the sbert website you mentioned that you used softmax loss for training sbert on NLI data and our data labels are similar to NLI data.

nreimers commented 3 years ago

That it works on NLI is rather a coincidence, but there is not good logic behind it: https://www.sbert.net/examples/training/nli/README.html#multiplenegativesrankingloss

Gchand0249 commented 3 years ago

Thank you for you suggestion , In hard multiplenegativesrankingloss , team stated that "" You can also provide one or multiple hard negatives per anchor-positive pair by structering the data like this: (a_1, p_1, n_1), (a_2, p_2, n_2) Here, n_1 is a hard negative for (a_1, p_1). The loss will use for the pair (a_i, p_i) all p_j (j!=i) and all n_j as negatives. ""

Could you please eloberate this statement . Does it mean loss use P_j and n_j as negatives for a_i ?

nreimers commented 3 years ago

Yes

Gchand0249 commented 3 years ago

We did synonym expansion for data and in our case most of a_i and p_j are positive. How does it works in this case. Won't it effect the embeddings?

Gchand0249 commented 3 years ago

We did synonym expansion for data and in our case most of a_i and p_j are positive. How does it works in this case. Won't it effect the embeddings?

nreimers commented 3 years ago

Then you have to create a custom DataLoader that ensures that a batch does not contain two entries of the same type

Gchand0249 commented 3 years ago

It is very difficult for us to extract two entries of the same type. Is it okay to go with triplet loss ?

nreimers commented 3 years ago

Sure

Gchand0249 commented 3 years ago

Thank you .... Does distance_metric in triplet loss has any impact on performance ? We tried with default Euclidean preformance was not good so we are trying now with COsine.