SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.

Apache License 2.0

68 stars 57 forks source link

Create dataset loader for UIT-ViWikiQA #625

Closed SamuelCahyawijaya closed 6 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: uit_viwikiqa/uit_viwikiqa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?uit_viwikiqa

Dataset	uit_viwikiqa
Description	UIT-ViWikiQA is a Vietnamese sentence extraction-based machine reading comprehension dataset. It is created from the UIT-ViQuAD dataset. It comprises of 23,074 question-answers based on 5,109 passages of 174 Wikipedia Vietnamese articles.
Subsets	-
Languages	vie
Tasks	Question Answering
License	Other (other)
Homepage	https://sites.google.com/uit.edu.vn/kietnv/datasets
HF URL	-
Paper URL	-

akhdanfadh commented 6 months ago

@holylovenia If I may, I want to try working on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

holylovenia commented 6 months ago

@holylovenia If I may, I want to try working on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

Sure @akhdanfadh, you can try to submit the user agreement first then we can discuss if you receive the dataset after the dataloader implementation.

akhdanfadh commented 6 months ago

@holylovenia I have received the dataset thanks to the author's fast reply. I'm working on it now.

akhdanfadh commented 6 months ago

SEACrowd / seacrowd-datahub

Create dataset loader for UIT-ViWikiQA #625

self-assign