EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.02k stars 52 forks source link

add tinyllava #114

Closed zjysteven closed 1 week ago

zjysteven commented 1 week ago

About

As the title says, this PR adds TinyLLaVA model family.

Reproducing original reported results (help needed from maintainers)

Since the tinyllava package may have some dependency conflicts with lmms-eval, I follow the existing example of llava to build tinyllava and lmms-eval without dependencies and later on install deps with a requirement file miscs/tinyllava_repr_requirements.txt. The setup file with described steps is miscs/tinyllava_repr_scripts.sh.

MME MMMU_val MMVet POPE ScienceQA_img TextVQA GQA VQAv2
reported 1466.4 38.4 37.5 87.2 73.0 60.3 62.1 80.1 (test)
reproduced 1467.0 38.6 34.5 87.3 72.9 55.8 62.2 78.2 (val)

The above table compares the official results reported in the last row of this table with those reproduced by me using lmms-eval.

There's noticeable discrepancy on MMVet and TextVQA. I'm not sure if it's because the potential discrepancy in the evaluation setup (I'm new to this field) and I would appreciate experienced people (e.g. the maintainers) to have a look.

Luodian commented 1 week ago

@pufanyi I think TextVQA has different split (w/ or wo/ OCR tokens?). Please help to confirm, thanks!