embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.67k stars 221 forks source link

Adding a new Dataset: Arabic Reviews of SHEIN #684

Closed Ruqyai closed 2 months ago

Ruqyai commented 2 months ago

Title:

Adding a new Dataset: Arabic Reviews of SHEIN

Description:

This dataset contains Arabic-language reviews of products from the SHEIN online store. The reviews cover various aspects of the products and overall customer satisfaction. The goal of collecting the dataset is to include a wide range of common phrases and terms used in daily conversation, reflecting the diversity of the dialects of the Arabic language, especially in Saudi Arabia.

Languages:

Arabic

Data Columns:

raw_text: (Comment )The text of the review written by the customer.
text: (Cleaned_Comment) Removing emojis and repeated characters from the text of the review written by the customer.
label: (Rating) The numerical rating given by the customer, indicating their satisfaction level with the product.

Link

https://huggingface.co/datasets/Ruqiya/Arabic_Reviews_of_SHEIN

PR: #710

Ruqyai commented 2 months ago

PR https://github.com/embeddings-benchmark/mteb/pull/710