Closed JoshonSmith closed 1 day ago
great work! Why use InternVL2 as the caption model? Does InternVL2 work best in the experimental phase?
Thanks for your attention to our work! At the start of this project, InternVLM was one of the top-ranked models in multi-modal understanding benchmark at the time, so we chose it.
great work! Why use InternVL2 as the caption model? Does InternVL2 work best in the experimental phase?