SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
66 stars 58 forks source link

Create dataset loader for Okapi m-ARC #474

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 8 months ago

Dataloader name: okapi_m_arc/okapi_m_arc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?okapi_m_arc

Dataset okapi_m_arc
Description m-ARC is a multilingual version of ARC, a dataset of 7,787 genuine grade-school level, multiple-choice science questions assembled to encourage research in advanced question-answering and reasoning.
Subsets -
Languages ind, vie
Tasks Commonsense Reasoning
License Creative Commons Attribution Non Commercial 4.0 (cc-by-nc-4.0)
Homepage http://nlp.uoregon.edu/download/okapi-eval/datasets/
HF URL https://huggingface.co/datasets/jon-tow/okapi_arc_challenge
Paper URL https://arxiv.org/abs/2307.16039
tellarin commented 8 months ago

self-assign

tellarin commented 7 months ago

Back working on this. Sorry for the delay.

holylovenia commented 7 months ago

Back working on this. Sorry for the delay.

Sure, please let us know if you need any help, @tellarin!

holylovenia commented 6 months ago

Hi @tellarin, thanks for taking this PR. Just a heads up, due to the delay, I would like to let @SamuelCahyawijaya take over this issue if there's no update until Tuesday, 16 April 2024 EoD AoE (23:59 UTC-12).