SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Aya Evaluation Suite #479

Closed SamuelCahyawijaya closed 5 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: aya_evaluation_suite/aya_evaluation_suite.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?aya_evaluation_suite

Dataset aya_evaluation_suite
Description Aya Evaluation Suite contains a total of 26,750 open-ended conversation-style prompts to evaluate multilingual open-ended generation quality.
Subsets dolly_mt_ceb, dolly_mt_sun, dolly_mt_jav, dolly_mt_zsm, dolly_mt_ind, dolly_mt_ace, dolly_mt_vie, dolly_mt_lao, dolly_mt_min, dolly_mt_mya, dolly_mt_bjn, dolly_mt_tam, dolly_mt_khm, dolly_mt_tha
Languages ceb, tha, mya, zsm, jav, ind, vie, sun, ace, bjn, khm, lao, min
Tasks Chatbot
License Apache license 2.0 (apache-2.0)
Homepage https://huggingface.co/datasets/CohereForAI/aya_evaluation_suite
HF URL -
Paper URL https://arxiv.org/abs/2402.06619
tellarin commented 6 months ago

Back working on this. Sorry for the delay.

muhammadravi251001 commented 5 months ago

self-assign