Closed StarCycle closed 1 year ago
It is the original git2 model from 2022. It is the vqav2-fine tuned model. In the paper, this model achieves 81.92 on test-std as shown in the table 18 (a). No extra fine tuning with other data.
Thanks. A good question. GIT2 is a strong model, which achieves a good fourth place in the perception ranking, but 10th place in the cognition ranking. This reveals that the current MLLMs may still have a large space for improvement in perception, and have certain advantages in cognition.
The performance of GIT2 in the leaderboard is quite impressive. It only has 5.1B parameters. The original paper was published in 2022 and their repository has not been updated since March 2023. The original GIT and GIT2 models did not use techniques like instruct fine-tuning. However, GIT2 still beats many state-of-the-art models in August 2023.
The performance comes from a newer close-source variant from Microsoft, or an open-source version, or the original GIT2 in 2022?