AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).
https://ailab-cvc.github.io/seed
Other
515 stars 30 forks source link

Train data #25

Open APiaoG opened 4 months ago

APiaoG commented 4 months ago

您好,感谢您的开源和杰出的工作!我想问一下在SEED/MultiModalLLM/configs/data/caption_torchdata_preprocess.yaml中 data_dir:

我想问一下这里的数据集从哪里下载呢?我关注到论文里有说“We filtered the samples in these datasets based on image resolution, aspect ratio, and visual-textual similarity. We randomly place images or text at the forefront, in order to achieve the generation of captions based on images and vice versa.” 如果可以的话,是否可以开源训练数据呢?非常感谢!

geyuying commented 4 months ago

由于这些数据的版权不归我们所有,所以我们无法提供下载好的数据集,可以去相应的官网下载这些公开数据集。