SamuelCahyawijaya commented 12 months ago

Dataset	coco_35l
Description	COCO-35L is a machine-generated image caption dataset, constructed by translating COCO Captions (Chen et al., 2015) to the other 34 languages using Google’s machine translation API.
Subsets	fil, ind, tha, vie
Languages	fil, ind, tha, vie
Tasks	Image-to-Text Generation
License	Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage	https://google.github.io/crossmodal-3600/
HF URL	-
Paper URL	https://aclanthology.org/2022.emnlp-main.45/

IvanHalimP commented 12 months ago

self-assign

IvanHalimP commented 11 months ago

152520 image ids are not found in the coco 2014 training caption. validation set is ok Using COCO 2014 train and validation set.

SEACrowd / seacrowd-datahub