Did you use the full LAION-400M and COYO-700M dataset for pre-training, or just sampled subsets. What's your total amount of image-text pairs for pre-training?

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

MIT License

2.25k stars 171 forks source link

Closed linserSnow closed 11 months ago

MAGAer13 commented 11 months ago

In the appendix, we only utilize about 400 millions image-text pairs from LAION-400M and COYO-700M. We did not use the whole datasets.