LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
ScienceQA-指标低太多了 #7

zhangbaijin commented 4 months ago

非常感谢你的开源工作,但是在下载你的pretrained model,碰到了bug,因此我下载了llava-7b-lora的model,跟issue5的碰到的一样,测试scienceqa的命令行如下:

python -m llava.eval.model_vqa_science \
    --model-base /mnt/xiaofeng.zxf/models/vicuna-7b-v1.5 \
    --model-path /mnt/xiaofeng.zxf/code/LLaVA-PruMerge/llava-v1.5-7b-lora \
    --question-file /mnt/xiaofeng.zxf/llava_test_CQM-A_image.json \
    --image-folder /mnt/xiaofeng.zxf/ScienceQA_DATA/test \
    --answers-file ./llava-v1.5-7b.jsonl \
    --temperature 0 \
    --conv-mode vicuna_v1

但是结果测出来却只有IMG-Accuracy: 9.27%,你的pretrained-model是正确的吗

42Shawn commented 4 months ago

IMG-Accuracy: 9.27%太低了,random guess都有这个数了。你应该是eval弄错了

zhangbaijin commented 4 months ago

一开始下载了你给的pretrained model,结果测试报错,然后看到了issue5的相同问题,换成了原本llava-1.5-lora的pretrained model,测出来是这个结果,而且测试时间要40几分钟,很奇怪,所以确认一下你给的模型是完整的吗

mu-cai commented 3 months ago

Hi, we updated the script. Are you able to reproduce results now?

liuxiaozhu01 commented 2 months ago

Hi, @42Shawn @mu-cai I just encountered a similar question. I tried to evaluate on TestVQA without fine-tuning. The scripts I run is scripts/v1_5/eval/testvqa.sh

CUDA_VISIBLE_DEVICES=0 python -m llava.eval.model_vqa_loader \
    --model-base /root/home/workspace/LLM/vicuna/lmsys/vicuna-7b-v1.5 \
    --model-path /root/home/workspace/LLM/llava/llava-v1.5-7b \
    --question-file ./playground/data/eval/textvqa/llava_textvqa_val_v051_ocr.jsonl \
    --image-folder ./playground/data/eval/textvqa/train_images \
    --answers-file ./playground/data/eval/textvqa/answers/llava-v1.5-7b.jsonl \
    --temperature 0 \
    --conv-mode vicuna_v1

python -m llava.eval.eval_textvqa \
    --annotation-file ./playground/data/eval/textvqa/TextVQA_0.5.1_val.json \
    --result-file ./playground/data/eval/textvqa/answers/llava-v1.5-7b.jsonl

--model-base /root/home/workspace/LLM/vicuna/lmsys/vicuna-7b-v1.5 is the vicuna-7b model downloaded from huggingface, and --model-path /root/home/workspace/LLM/llava/llava-v1.5-7b is the original llava checkpoint downloaded from here. The output is shown below

Loading LLaVA from base model...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:53<00:00, 26.54s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at /root/home/workspace/LLM/vicuna/lmsys/vicuna-7b-v1.5 and are newly initialized: ['model.mm_projector.2.weight', 'model.mm_projector.2.bias', 'model.mm_projector.0.bias', 'model.mm_projector.0.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 32000. This might induce some performance reduction as *Tensor Cores* will not be available. For more details  about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
  0%|                                                                                                                                                               | 0/5000 [00:00<?, ?it/s]/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:362: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
/root/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:367: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [1:03:37<00:00,  1.31it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2149.71it/s]
Samples: 5000
Accuracy: 5.29%

Anyone could help? Thanks in advanced

liuxiaozhu01 commented 2 months ago

Well. When --model-base is canceled, the results become normal. Once --model-base is assigned, it will load the LLM params from model_base. it seems that applying origin LLM params(from lmsys/vicuna-7b-v1.5 here) is harmful to the performance of LLaVA and it also slows down inference speed. WHY?