UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.78k stars 2.43k forks source link

Finetuning a STS like finetuned model on binary data #727

Open tide90 opened 3 years ago

tide90 commented 3 years ago

Hey,

would would be the best way to use such a great finetuned model like: https://huggingface.co/T-Systems-onsite/cross-en-de-roberta-sentence-transformer

and finetune it for my task. My task is semantic similarity of 2 sentences. So, I have data with binary labels (0,1).

@nreimers @PhilipMay

Have you guys just finetuned this model for specific task with labels just with semantic scores (like with the original sts finetuing task) or do you have any experience using it on binary labels? What loss would then be suitable? I ask this, because I do not know if I then also have to finetune it on labeled scores.

Best regards.

nreimers commented 3 years ago

ConstrativeLoss and MultipleNegativesRankingLoss would be suitable

tide90 commented 3 years ago

@nreimers So it makes not great difference if the pretrained model was finetuned on different task and loss?

nreimers commented 3 years ago

If the task is similar, then using the fine tuned model can be helpful.

tide90 commented 3 years ago

@nreimers > If the task is similar, then using the fine tuned model can be helpful.

That was my question. :-) What is in my case having binary labels where the underlying model was trained on scores labels? I do not know if you can say that this is a similiar task?:) Look at my opening post.

nreimers commented 3 years ago

Try it and see what happens

tide90 commented 3 years ago

Yes I already did, but maybe you did make some experience with that. :-) I think basically it is a matter of going from cosine loss to a loss on binary data and what does this might change.

localoca5 commented 2 years ago

Hi @nreimers @tide90 , I have the same issue. My data are 0/1 labeled(similar to a NLI task) and I found that the performance on semantic search has become much worse than before the finetune process. Appreciate any thoughts or experience on this issue.

(I am using distiluse-base-multilingual-cased-v1 , Contrastive loss, and add a 32-dim Dense layer(for dimension reduction), dataset language:Chinese).

nreimers commented 2 years ago

Constrative loss works quite bad, use Cosine Similarity Loss or maybe MultipleNegativesRankingLoss if applicable. 32 dim is also quite low

localoca5 commented 2 years ago

Constrative loss works quite bad, use Cosine Similarity Loss or maybe MultipleNegativesRankingLoss if applicable. 32 dim is also quite low

Constrative loss works quite bad, use Cosine Similarity Loss or maybe MultipleNegativesRankingLoss if applicable. 32 dim is also quite low

Thank you @nreimers ! I will try out these two losses and will let you know when i get some results. Using 32 dim is mainly because it is required by my workmate, who is working in a recommender system group. Dimension higher than 32 will cause the recommender to be slower, which they are not OK with.