Create dataset loader for XM3600

Dataset	xm3600
Description	Crossmodal-3600 dataset (XM3600 in short), a geographically-diverse set of 3600 images annotated with human-generated reference captions in 36 languages. The images were selected from across the world, covering regions where the languages are spoken, and annotated with captions that achieve consistency in terms of style across all languages, while avoiding annotation artifacts due to direct translation. The languages covered in the dataset include Filipino, Indonesian, Thai, and Vietnamnese
Subsets	XM3600_fil, XM3600_id, XM3600_th, XM3600_vi
Languages	fil, ind, tha, vie
Tasks	Image-to-Text Generation
License	Creative Commons Attribution 4.0 (cc-by-4.0)
Homepage	https://google.github.io/crossmodal-3600/
HF URL	https://huggingface.co/datasets/dinhanhx/crossmodal-3600
Paper URL	https://aclanthology.org/2022.emnlp-main.45/

SEACrowd / seacrowd-datahub