embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.83k stars 246 forks source link

Add baseline models for Russian #962

Closed artemsnegirev closed 3 months ago

artemsnegirev commented 3 months ago

This PR adds four models for evaluation on the Russian part of MTEB. There are also results for 22 tasks (table below). From discussion #705.


Attaching a table of results to check:

sbert_large_mt_nlu_ru sbert_large_nlu_ru rubert-tiny rubert-tiny2
CEDRClassification 36.81 35.84 37.39 36.87
GeoreviewClassification 39.67 39.97 33.45 39.64
GeoreviewClusteringP2P 58.45 59.02 34.4 44.18
HeadlineClassification 77.19 79.26 57.65 74.19
InappropriatenessClassification 64.64 62.52 54.5 58.57
KinopoiskClassification 50.33 49.51 41.36 49.06
MassiveIntentClassification 61.42 61.09 50.1 50.83
MassiveScenarioClassification 68.13 67.6 52.15 59.15
RiaNewsRetrieval 21.4 11.11 0.79 13.92
RuBQReranking 56.13 46.81 35.44 46.09
RuBQRetrieval 29.8 12.45 3.24 10.87
RUParaPhraserSTS 65.17 62.06 53.41 65.14
RuReviewsClassification 58.29 58.27 49.56 56.99
RuSciBenchGRNTIClassification 54.19 53.9 35.71 45.63
RuSciBenchGRNTIClusteringP2P 52.2 50.4 29.89 41.41
RuSciBenchOECDClassification 43.8 43.04 26.51 35.48
RuSciBenchOECDClusteringP2P 47.29 46.41 27.98 38.09
RuSTSBenchmarkSTS 71.22 58.82 58.16 69.43
SensitiveTopicsClassification 28.47 27.97 18.54 22.02
STS22 56.82 50.75 47.88 50.23
TERRa 51.97 50.17 52.85 51.87
XNLI 64.28 59.82 61.49 67.41

Checklist

Adding a model checklist