Open YuchenLiu98 opened 3 months ago
Hi @YuchenLiu98, thank you once again for your interest in our LLaVA-MORE project.
For evaluation purposes, we use the lmms-eval
library (https://github.com/EvolvingLMMs-Lab/lmms-eval), in which we integrate our models.
Regarding the results on Text-VQA, please note that the results shown in our table are calculated by considering the OCR tokens as part of the input prompt. https://github.com/EvolvingLMMs-Lab/lmms-eval/issues/6
Thanks a lot for your excellent job. I wonder how you evaluate the trained model, do you use ./scripts/more/eval/pope.sh, which uses llava.eval.model_vqa_loader for evaluation (seems no modification from llava1.5). However, I downloaded your released model weight (LLaVA_MORE-llama_3_1-8B-finetuning) and do evaluation, but find extremely low results for textvqa (only 38.66%) and gqa (52.39%). Is there something wrong with the evaluation? Thanks a lot for your help.