UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.95k stars 2.44k forks source link

What are the 50+ languages? #BenderRule #1052

Open andreasvc opened 3 years ago

andreasvc commented 3 years ago

The docs talk about "50+ languages": https://sbert.net/docs/pretrained_models.html#multi-lingual-models

It would be very useful to list them explicitly.

The XLM-R languages are listed explicitly here: https://github.com/facebookresearch/XLM#xlm-r-new-model

When the docs say "50+ languages", does that mean that all of the 100 languages specified above are supported by the paraphrase-xlm-r-multilingual-v1 model? How about the other paraphrase models?

Cheers

nreimers commented 3 years ago

Languages are listed on the site. Same languages used for all multililingual models (if not stated otherwise)

andreasvc commented 3 years ago

Thanks for the response. Fair enough, the 100 languages for XLM-R are listed right below, I see it now.

Just my 2 cents, but I'd say this could be presented in a more user-friendly way (i.e., don't expect readers to parse the abbreviations in the model names), and I'm still confused why you would say 50+ when the exact list of 100 languages is right below; perhaps an anchor link to this list could be added to make it completely unambiguous.

Cheers.