SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
69 stars 57 forks source link

Create dataset loader for MSVD-Indonesian #80

Closed SamuelCahyawijaya closed 9 months ago

SamuelCahyawijaya commented 1 year ago

Dataloader name: id_msvd/id_msvd.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?id_msvd

Dataset id_msvd
Description MSVD-Indonesian is derived from the MSVD (Microsoft Video Description) dataset, which is obtained with the help of a machine translation service (Google Translate API). This dataset can be used for multimodal video-text tasks, including text-to-video retrieval, video-to-text retrieval, and video captioning. Same as the original English dataset, the MSVD-Indonesian dataset contains about 80k video-text pairs.
Subsets -
Languages ind
Tasks Text Retrieval, Image-to-Text Generation
License MIT (mit)
Homepage https://github.com/willyfh/msvd-indonesian
HF URL -
Paper URL https://arxiv.org/abs/2306.11341
akhdanfadh commented 1 year ago

self-assign