UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT
https://www.SBERT.net
Apache License 2.0
14.73k stars 2.43k forks source link

Loss function used for cross-encoder in textual similarity tasks #1371

Closed YLi999 closed 2 years ago

YLi999 commented 2 years ago

Hi, I'm trying to fine-tune cross-encoder for textual similarity, following the example provided in "cross-encoder/training_stsbenchmark.py", and setting num_label =1.

However, I get confused about the loss function when reading the source code (sentence_transformers/cross_encoder/CrossEncoder.py). In line 176, It seems that if setting num_label = 1, the loss function would be nn.BCEWithLogitsLoss().

Could you please explain why the loss function is not MSE, which seems to be more suitable for this regression task? Or maybe I misunderstood the code or the approach? Thanks a lot!

nreimers commented 2 years ago

Yes, by default it uses BCEWithLogitsLoss. If you have scores between 0 and 1, BCEWithLogitsLoss is usually better than MSE.

YLi999 commented 2 years ago

Thanks a lot for your reply!