Do you have plan to support much bigger dataset, such as laion

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.41k stars 248 forks source link

Do you have plan to support much bigger dataset, such as laion #208

Closed Ezra-Yu closed 2 years ago

Ezra-Yu commented 2 years ago

Thank you so much for developing this awesome job!

I notice your pre-train full dataset is much smaller than the CLIP, ALIGN, and CoCa. There are more and more open source datasets, such as laion, which greatly improve the model, especially the vision task.

Do you have plans to support such datasets to improve the model's capabilities and release the checkpoints?

JustinLin610 commented 2 years ago

Yep we have been working on this actually, but it should take some time as it is a time-and-resource-consuming task.