SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
64 stars 57 forks source link

Create dataset loader for ViMMRC #577

Closed SamuelCahyawijaya closed 4 months ago

SamuelCahyawijaya commented 6 months ago

Dataloader name: vimmrc/vimmrc.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vimmrc

Dataset vimmrc
Description ViMMRC, a challenging machine comprehension corpus with multiple-choice questions, intended for research on the machine comprehension of Vietnamese text. This corpus includes 2,783 multiple-choice questions and answers based on a set of 417 Vietnamese texts used for teaching reading comprehension for 1st to 5th graders.
Subsets -
Languages vie
Tasks Commonsense Reasoning
License Unknown (unknown)
Homepage https://sites.google.com/uit.edu.vn/uit-nlp/datasets#h.1qeaynfs79d1
HF URL -
Paper URL https://ieeexplore.ieee.org/document/9247161
akhdanfadh commented 4 months ago

self-assign

akhdanfadh commented 4 months ago

Hi @holylovenia. The homepage in datasheet said that I need to

contact us via email: kietnv@uit.edu.vn (Mr. Kiet Nguyen) to sign the corpus user agreement and then receive the corpus.

But from here (the author's blog), the dataset is publicly available. I'm wondering how should I approach this one?

holylovenia commented 4 months ago

Hi @holylovenia. The homepage in datasheet said that I need to

contact us via email: kietnv@uit.edu.vn (Mr. Kiet Nguyen) to sign the corpus user agreement and then receive the corpus.

But from here (the author's blog), the dataset is publicly available. I'm wondering how should I approach this one?

Let's go with the author's blog for convenience. 😂 Maybe the policy has changed and now it became publicly available.