Loss function used for cross-encoder in textual similarity tasks

YLi999 commented 2 years ago

Hi, I'm trying to fine-tune cross-encoder for textual similarity, following the example provided in "cross-encoder/training_stsbenchmark.py", and setting num_label =1.

However, I get confused about the loss function when reading the source code (sentence_transformers/cross_encoder/CrossEncoder.py). In line 176, It seems that if setting num_label = 1, the loss function would be nn.BCEWithLogitsLoss().

Could you please explain why the loss function is not MSE, which seems to be more suitable for this regression task? Or maybe I misunderstood the code or the approach? Thanks a lot!

nreimers commented 2 years ago

Yes, by default it uses BCEWithLogitsLoss. If you have scores between 0 and 1, BCEWithLogitsLoss is usually better than MSE.

YLi999 commented 2 years ago

Thanks a lot for your reply!

UKPLab / sentence-transformers

Loss function used for cross-encoder in textual similarity tasks #1371