Category in Classification Datasets

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

https://arxiv.org/abs/2210.07316

Apache License 2.0

1.94k stars 270 forks source link

Category in Classification Datasets #677

Closed Art3mis0707 closed 6 months ago

Art3mis0707 commented 6 months ago

Will sentiment analysis of words be considered under the "s2s" category?

https://huggingface.co/datasets/senti_lex

This is the dataset in context. I think it's helpful as there are 7 very low resource languages (ido, Breton, western Frisian,walloon,volapuk, Norwegiannynsork, aragonese) which have not been included in any other dataset. @KennethEnevoldsen @isaac-chung @imenelydiaker

KennethEnevoldsen commented 6 months ago

@Art3mis0707 I don't believe senti-lex is a reasonable fit for MTEB. The benchmark concerns itself with document representations and single word representation I would argue does not fall into that category.