UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.78k stars 2.43k forks source link

Multilingual Models for Asymmetric Search #1735

Open mahirkukreja opened 1 year ago

mahirkukreja commented 1 year ago

@nreimers Thanks for this framework. It's unbelievable.

I've been exploring multilingual models these days and I was wondering if there is any additional work going on beyond these models when it comes to multilingual embeddings?

Would you happen to have any benchmarks comparing these models with MUSE or CMLM or this one?

Mostly trying to understand the current SOTA when it comes to multilingual asymmetric search.

nreimers commented 1 year ago

Hi, sadly none of these model work well as they have mostly been trained on sentences.

We will soon release a model at www.cohere.ai that allows multilingual asymmetric search with really impressive results.

mahirkukreja commented 1 year ago

Thanks for the quick response. Appreciate it.

I noticed that we have a multilingual cross encoder. Do you have access to any performance metrics?

nimpy commented 1 year ago

I would also be interested in what is the state of the art for multilingual asymmetric search!

Having a first glance at this model, I am not impressed with its performance. I also tried this multilingual model, which I think is not intended to be used for asymmetric search (in any case, it's not performing great either).