add tinyllava - Githubissues

About

As the title says, this PR adds TinyLLaVA model family.

Reproducing original reported results (help needed from maintainers)

Since the tinyllava package may have some dependency conflicts with lmms-eval, I follow the existing example of llava to build tinyllava and lmms-eval without dependencies and later on install deps with a requirement file miscs/tinyllava_repr_requirements.txt. The setup file with described steps is miscs/tinyllava_repr_scripts.sh.

	MME	MMMU_val	MMVet	POPE	ScienceQA_img	TextVQA	GQA	VQAv2
reported	1466.4	38.4	37.5	87.2	73.0	60.3	62.1	80.1 (test)
reproduced	1467.0	38.6	34.5	87.3	72.9	55.8	62.2	78.2 (val)

The above table compares the official results reported in the last row of this table with those reproduced by me using lmms-eval.

There's noticeable discrepancy on MMVet and TextVQA. I'm not sure if it's because the potential discrepancy in the evaluation setup (I'm new to this field) and I would appreciate experienced people (e.g. the maintainers) to have a look.

For MMVet, lmms-eval seems to use GPT for evaluation as the final results.json has a gpt_eval_score,none field for MMVet result (where I'm getting the 34.5 number). However, in TinyLLaVA's evaluation instruction, MMVet's result needs to be submitted to a evaluation server.
For TextVQA(_val), I'm taking the number from the exact_match,none field under textvqa_val from the results.json. However I also see a submission,none field there and I'm not sure if it means that TextVQA results need to be submitted somewhere. Meanwhile it's unclear what metric TinyLLaVA is reporting for TextVQA(_val) due to limited documentation.

EvolvingLMMs-Lab / lmms-eval

add tinyllava #114

About

Reproducing original reported results (help needed from maintainers)