SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
68 stars 57 forks source link

Create dataset loader for M3Exam #157

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 11 months ago

Dataloader name: m3exam/m3exam.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?m3exam

Dataset m3exam
Description M3Exam is a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23% of the questions require processing images for successful solving. M3Exam dataset covers 3 languages spoken in Southeast Asia.
Subsets Javanese, Thai, Vietnamese
Languages jav, tha, vie
Tasks Question Answering
License Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage https://github.com/DAMO-NLP-SG/M3Exam
HF URL -
Paper URL https://arxiv.org/abs/2306.05179
akhdanfadh commented 11 months ago

self-assign

github-actions[bot] commented 11 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

holylovenia commented 10 months ago

Hi @akhdanfadh, we are freeing up this dataloader issue due to the lack of response so other contributors can #self-assign.

MJonibek commented 10 months ago

self-assign

github-actions[bot] commented 10 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.