SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.

Apache License 2.0

68 stars 57 forks source link

Create dataset loader for VlogQA #621

Closed SamuelCahyawijaya closed 5 months ago

SamuelCahyawijaya commented 7 months ago

Dataloader name: vlogqa/vlogqa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?vlogqa

Dataset	vlogqa
Description	VlogQA is a Vietnamese spoken language corpus for machine reading comprehension. It consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube videos around food and travel.
Subsets	-
Languages	vie
Tasks	Question Answering
License	Other (other)
Homepage	https://github.com/sonlam1102/vlogqa/tree/main
HF URL	-
Paper URL	-

akhdanfadh commented 6 months ago

@holylovenia If I may, I want to work on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

holylovenia commented 6 months ago

@holylovenia If I may, I want to work on this dataset. But it requires a dataset user agreement. Can I submit on behalf of the SEACrowd organization? I'm also unsure if I can receive the dataset before the dataloader implementation.

Sure @akhdanfadh, you can try to submit the user agreement first then we can discuss if you receive the dataset after the dataloader implementation.

akhdanfadh commented 6 months ago

I just received the dataset, working on it now.

akhdanfadh commented 6 months ago

SEACrowd / seacrowd-datahub

Create dataset loader for VlogQA #621

self-assign