UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.45k stars 2.5k forks source link

Multilingual training #950

Open Samarthagarwal23 opened 3 years ago

Samarthagarwal23 commented 3 years ago

Hi Nils, I have English, Chinese and Indonesian text data for semantic search use case. I have sentence pairs with different language combinations and similarity score. I tried pretrained Xlm-r sentence embedding model, which performs better than fine tuned model on sentence pair data using cosine similarity loss.

Q1. To be able to have good semantic similarity with aligned representation across 3 languages, is it better to have mseloss or cosine? Q2. Would you recommend training a student model sentence embeddings only for 3 languages above and then experiment on that

nreimers commented 3 years ago

1) MSE works quite well for me. 2) Not sure what you mean