SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Create dataset loader for ViGEText_17to23 #581

Closed SamuelCahyawijaya closed 2 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: vigetext/vigetext.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vigetext

Dataset vigetext
Description The high-quality dataset with structured guidelines for typing LaTeX formulas in Mathematics, Physics, Chemistry, and Biology. Objective was to cover the entire scope of the Vietnamese General Education Examination spanning from 2017 to 2023. This comprehensive approach included the challenging examinations of the years 2017 and 2018, which have been significant for nearly all Vietnamese students in recent years. It is important to highlight that the exact and unquestionably correct answers have been exclusively obtained from the Vietnamese Ministry of Education.
Subsets -
Languages vie
Tasks Question Answering
License Unknown (unknown)
Homepage https://huggingface.co/datasets/uitnlp/ViGEText_17to23?row=27
HF URL https://huggingface.co/datasets/uitnlp/ViGEText_17to23?row=27
Paper URL https://dl.acm.org/doi/10.1145/3628797.3628837
chenxwh commented 3 months ago

self-assign