BradyFU / Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.55k stars 803 forks source link

Does the performance of GIT2 come from the weights in 2022? #54

Closed StarCycle closed 1 year ago

StarCycle commented 1 year ago

The performance of GIT2 in the leaderboard is quite impressive. It only has 5.1B parameters. The original paper was published in 2022 and their repository has not been updated since March 2023. The original GIT and GIT2 models did not use techniques like instruct fine-tuning. However, GIT2 still beats many state-of-the-art models in August 2023.

The performance comes from a newer close-source variant from Microsoft, or an open-source version, or the original GIT2 in 2022?

amsword commented 1 year ago

It is the original git2 model from 2022. It is the vqav2-fine tuned model. In the paper, this model achieves 81.92 on test-std as shown in the table 18 (a). No extra fine tuning with other data.

BradyFU commented 1 year ago

Thanks. A good question. GIT2 is a strong model, which achieves a good fourth place in the perception ranking, but 10th place in the cognition ranking. This reveals that the current MLLMs may still have a large space for improvement in perception, and have certain advantages in cognition.