BradyFU / Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
11.59k stars 750 forks source link

There may be another large scale Image-Text pair dataset that can be added #112

Closed Vikho closed 7 months ago

Vikho commented 7 months ago

dataset name: COYO descrip: COYO-700M is a large-scale dataset that contains 747M image-text pairs as well as many other meta-attributes to increase the usability to train various models. repo: https://github.com/kakaobrain/coyo-dataset/tree/main huggingface: https://huggingface.co/datasets/kakaobrain/coyo-700m/viewer/default/train

xjtupanda commented 7 months ago

Thanks for your reminder. The dataset has been added.