foundation-multimodal-models / CAL

[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Apache License 2.0
48 stars 2 forks source link

Analysis of the hallucination benchmark result in Appendix of your paper #1

Open laserwave opened 4 months ago

laserwave commented 4 months ago

Hi,nice work.

In table 7, you report the POPE result, which decreased in some sets of experiments(comparing with and without). As your method assigns low weights to contradictory text tokens, an increase of hallucination benchmark metrics is expected in my opinion.

Do you have any comments on this, thank you.

Menoly-xin commented 4 months ago

Hi, I apologize for the delayed reply as I am currently occupied with graduation preparations and related travels.

Thanks for your kind opinion. In my view, the POPE benchmark may not be optimal for evaluating hallucination due to its excessively high scores and minimal variability. Alternative benchmarks may indeed be more suitable for these assessments (for more information, please refer to https://arxiv.org/pdf/2312.00849). After my vacation, I will augment the evaluation results from these related benchmarks if possible.