YiyangZhou / POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Apache License 2.0
71 stars 3 forks source link

Failed to reproduce the score [POPE, ScienceQA] #8

Open Wang-Xiaodong1899 opened 3 months ago

Wang-Xiaodong1899 commented 3 months ago

Hi, thanks for your great work!

I want to evaluate your released model. And I downloaded the two lora models: stage_one_lora: https://huggingface.co/YiyangAiLab/llava_POVID_stage_one_lora stage_two_lora: https://huggingface.co/YiyangAiLab/llava_POVID_stage_two_lora And I merger the lora models with LLaVA-v1.5-7b step by step:

python merge_lora_weights.py --model-path lava_POVID_stage_one_lora --model-base [llava1.5 7b] --save-model-path stage_one_merged
python merge_lora_weights.py --model-path lava_POVID_stage_two_lora  --model-base llava_POVID_stage_one_merged --save-model-path stage_two_merged

Then, I evaluated the stage_two_merged on POPE and Science QA benchmarks: But I only got: 85.96 on POPE 67.72 on ScienceQA

The paper results are: 86.90 on POPE 68.8 on ScienceQA

Any problems in my process? Can you give me some advice?

Thanks a lot!