SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
57 stars 54 forks source link

Create dataset loader for UIT-ViQuAD #575

Open SamuelCahyawijaya opened 3 months ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: uit_viquad/uit_viquad.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?uit_viquad

Dataset uit_viquad
Description Vietnamese Question Answering Dataset (UIT-ViQuAD), a new dataset for the low-resource language as Vietnamese to evaluate MRC models. This dataset comprises over 23,000 human-generated question-answer pairs based on 5,109 passages of 174 Vietnamese articles from Wikipedia.
Subsets -
Languages vie
Tasks Question Answering
License Unknown (unknown)
Homepage https://sites.google.com/uit.edu.vn/uit-nlp/datasets
HF URL -
Paper URL https://aclanthology.org/2020.coling-main.233/
patrickamadeus commented 3 months ago

self-assign

patrickamadeus commented 3 months ago

Hi! I have requested kietnv@uit.edu.vn for the dataset, but haven't received any response. Can anyone assist me / perhaps know the data issuer directly?

cc: @SamuelCahyawijaya @holylovenia

holylovenia commented 3 months ago

Hi! I have requested kietnv@uit.edu.vn for the dataset, but haven't received any response. Can anyone assist me / perhaps know the data issuer directly?

cc: @SamuelCahyawijaya @holylovenia

Hi @patrickamadeus, I'm not sure if any of us knows him personally. :( I guess our only option is to wait for now.

holylovenia commented 2 months ago

Hi @patrickamadeus, is there any reply from the dataset owner?

patrickamadeus commented 2 months ago

Hi @holylovenia , sadly no :(

akhdanfadh commented 2 months ago

@patrickamadeus I contacted him yesterday and got an instant reply. Try emailing him again.