SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Create dataset loader for CUB Bahasa #68

Closed SamuelCahyawijaya closed 10 months ago

SamuelCahyawijaya commented 10 months ago

Dataloader name: cub_bahasa/cub_bahasa.py DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?cub_bahasa

Dataset cub_bahasa
Description Semi-translated dataset of CUB-200-2011 into Indonesian. This dataset contains thousands of image-text annotation pairs of 200 subcategories belonging to birds. The natural language descriptions are collected through the Amazon Mechanical Turk (AMT) platform and are required at least 10 words, without any information on subcategories and actions.
Subsets -
Languages ind
Tasks Image-to-Text Generation
License Unknown (unknown)
Homepage https://github.com/share424/Indonesian-Text-to-Image-synthesis-with-Sentence-BERT-and-FastGAN/tree/master
HF URL -
Paper URL https://arxiv.org/abs/2303.14517
akhdanfadh commented 10 months ago

self-assign