Closed yejipark-m closed 6 months ago
Hi, thanks for your interest. Please refer to "Implementation Details" in Section 4.1 and Appendix A. for our hyperparameter settings for specific experiments.
Thanks for your reply. I had misused the noise step when evaluating POPE. I'll close the issue.
Hello, thanks for sharing your nice work.
I've encountered a problem when trying to reproduce the reported results, even after considering for the standard deviations. I'm currently utilizing the model checkpoints from Hugging Face.
model_paths[instructblip]="~/.cache/huggingface/hub/models--lmsys--vicuna-7b-v1.1" model_paths[llava]="liuhaotian/llava-v1.5-7b" model_paths[qwenvl]="Qwen/Qwen-VL"
I've kept the hyperparameters at their default settings:
python3 eval/object_hallucinationvqa${model}.py --model-path ${model_paths[$model]} --question-file data/POPE/aokvqa/aokvqapope${type}.json --image-folder data/MSCOCO/val2014 --answers-file ./output/${model}/aokvqapope${type}_vcd.jsonl --use_cd
However, with these checkpoints and hyperparameters, the output numbers are significantly lower than your reported performance, particularly on the GQA dataset and A-OKVQA dataset. The results without VCD are somewhat similar to the reported numbers, but using VCD seems to be where the issue lies.
Could you specify if there are any particular checkpoints you used for each model? Sharing the exact setup or recipe for correct inference would be greatly appreciated.