aimagelab / LLaVA-MORE

LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1
Apache License 2.0
77 stars 5 forks source link

Evaluation result with officially released weights. #4

Open YuchenLiu98 opened 3 weeks ago

YuchenLiu98 commented 3 weeks ago

Thanks a lot for your excellent job. I wonder how you evaluate the trained model, do you use ./scripts/more/eval/pope.sh, which uses llava.eval.model_vqa_loader for evaluation (seems no modification from llava1.5). However, I downloaded your released model weight (LLaVA_MORE-llama_3_1-8B-finetuning) and do evaluation, but find extremely low results for textvqa (only 38.66%) and gqa (52.39%). Is there something wrong with the evaluation? Thanks a lot for your help.

federico1-creator commented 2 weeks ago

Hi @YuchenLiu98, thank you once again for your interest in our LLaVA-MORE project.

For evaluation purposes, we use the lmms-eval library (https://github.com/EvolvingLMMs-Lab/lmms-eval), in which we integrate our models.

Regarding the results on Text-VQA, please note that the results shown in our table are calculated by considering the OCR tokens as part of the input prompt. https://github.com/EvolvingLMMs-Lab/lmms-eval/issues/6