SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for Aya Collection - Templated #481

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: aya_collection_templated/aya_collection_templated.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?aya_collection_templated

Dataset aya_collection_templated
Description The Aya Collection is a massive multilingual collection consisting of 513 million instances of prompts and completions covering a wide range of tasks. This dataset covers the templated prompts of the Aya Collection.
Subsets NusaXSenti, Thai-POS-inst, SCB-MT-2020-prompt, Thai-USEmbassy-prompt, Thai-Wikitionary-inst, UNER-LLM-inst, X-CSQA-inst
Languages ind, jav, sun, ace, ban, bbc, bjn, min, nij, tha, tgl, vie
Tasks Chatbot
License Apache license 2.0 (apache-2.0)
Homepage https://huggingface.co/datasets/CohereForAI/aya_evaluation_suite
HF URL -
Paper URL https://arxiv.org/abs/2402.06619
tellarin commented 6 months ago

Back working on this. Sorry for the delay.

muhammadravi251001 commented 5 months ago

self-assign