SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.

Apache License 2.0

68 stars 57 forks source link

Create dataset loader for M3Exam #157

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 11 months ago

Dataloader name: m3exam/m3exam.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?m3exam

Dataset	m3exam
Description	M3Exam is a novel benchmark sourced from real and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. In total, M3Exam contains 12,317 questions in 9 diverse languages with three educational levels, where about 23% of the questions require processing images for successful solving. M3Exam dataset covers 3 languages spoken in Southeast Asia.
Subsets	Javanese, Thai, Vietnamese
Languages	jav, tha, vie
Tasks	Question Answering
License	Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)
Homepage	https://github.com/DAMO-NLP-SG/M3Exam
HF URL	-
Paper URL	https://arxiv.org/abs/2306.05179

akhdanfadh commented 11 months ago

self-assign

github-actions[bot] commented 11 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.

holylovenia commented 10 months ago

Hi @akhdanfadh, we are freeing up this dataloader issue due to the lack of response so other contributors can #self-assign.

MJonibek commented 10 months ago

self-assign

github-actions[bot] commented 10 months ago

Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.