Open drchrisscheier opened 3 years ago
Currently there are none.
Which one would be interesting for you?
Thank you very the quick reply. For my use case, priorities would be in this order:
Thanks!
Thanks. But which models, for which tasks would be relevant?
sorry :) Task is to obtain similarity scores between sentence pairs / (sentence-label) pairs, like shown below. I am currently using the below model, which works very well for me. Another application would be along your semantic search example (using cosine sim for preflitering top_n, then re-ranking with cross-encoder)
model = CrossEncoder('sentence-transformers/ce-distilroberta-base-stsb')
scores = model.predict([["He won the game", "strength"],
["He won the game", "competition"],
["He won the game", "achievement"],
["He won the game", "performance"],
["He won the game", "caring"]])
I am searching for multi-lingual cross encoder model for Indonesian language. are there any such models that can help me?
@dingusagar There are no trained multi-lingual cross encoders available so far
I just made this comment in another similar issue - it should solve this problem.
Has anyone here tried the newest multilingual Cross Encoder model? It uses multilingual versions of the MiniLM and MSMarco datasets. It doesn't appear to be in the SBert documentation, but I just stumbled upon it while browsing HF. https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
There isn't any benchmark data, but this paper seems to have used a fairly similar process and shows that these multilingual datasets/models provide very competitive results when compared to monolingual datasets. https://arxiv.org/pdf/2108.13897.pdf
I just made this comment in another similar issue - it should solve this problem.
Has anyone here tried the newest multilingual Cross Encoder model? It uses multilingual versions of the MiniLM and MSMarco datasets. It doesn't appear to be in the SBert documentation, but I just stumbled upon it while browsing HF. https://huggingface.co/cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
There isn't any benchmark data, but this paper seems to have used a fairly similar process and shows that these multilingual datasets/models provide very competitive results when compared to monolingual datasets. https://arxiv.org/pdf/2108.13897.pdf
hi nickchomey, this multilingual Cross Encoder model can't found in huggingface.|
You can search huggingface for models. Here's the same one
https://huggingface.co/nreimers/mmarco-mMiniLMv2-L12-H384-v1
@dingusagar have u find any indonesian cross encoer yet?
@flitzcore, no actually I did not invest much time on cross encoders since they were computationally expensive when you have lots of vectors. The multilingual sbert bi-encoder models were doing the job for me with some extra post-processing for my task.
If you're okay with the license, then perhaps https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual is a solid option?
Hi there, thank you very much for this extremely helpful/useful library!
quick question: are there multilingual cross-encoders available? Going through the docs, I could not find an explicit reference/example of leveraging multilingual cross-encoders. When going through the link of available models, I found this candidate stsb-xlm-r-multilingual.zip - would that be the correct package?
Thank you!