VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
MIT License
2.83k stars 219 forks source link

Why use InternVL2 as the caption model? #126

Closed JoshonSmith closed 1 week ago

JoshonSmith commented 1 week ago

great work! Why use InternVL2 as the caption model? Does InternVL2 work best in the experimental phase?

staoxiao commented 1 week ago

Thanks for your attention to our work! At the start of this project, InternVLM was one of the top-ranked models in multi-modal understanding benchmark at the time, so we chose it.