SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

Create dataset loader for UIT-ViWikiQA #625

Closed SamuelCahyawijaya closed 1 month ago

SamuelCahyawijaya commented 3 months ago

Dataloader name: uit_viwikiqa/uit_viwikiqa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?uit_viwikiqa

Dataset uit_viwikiqa
Description UIT-ViWikiQA is a Vietnamese sentence extraction-based machine reading comprehension dataset. It is created from the UIT-ViQuAD dataset. It comprises of 23,074 question-answers based on 5,109 passages of 174 Wikipedia Vietnamese articles.
Subsets -
Languages vie
Tasks Question Answering
License Other (other)
Homepage https://sites.google.com/uit.edu.vn/kietnv/datasets
HF URL -
Paper URL -
akhdanfadh commented 1 month ago

@holylovenia If I may, I want to try working on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

holylovenia commented 1 month ago

@holylovenia If I may, I want to try working on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

Sure @akhdanfadh, you can try to submit the user agreement first then we can discuss if you receive the dataset after the dataloader implementation.

akhdanfadh commented 1 month ago

@holylovenia I have received the dataset thanks to the author's fast reply. I'm working on it now.

akhdanfadh commented 1 month ago

self-assign