VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
MIT License
2.58k stars 191 forks source link

Why use InternVL2 as the caption model? #126

Closed JoshonSmith closed 1 day ago

JoshonSmith commented 1 day ago

great work! Why use InternVL2 as the caption model? Does InternVL2 work best in the experimental phase?

staoxiao commented 1 day ago

Thanks for your attention to our work! At the start of this project, InternVLM was one of the top-ranked models in multi-modal understanding benchmark at the time, so we chose it.