Ruiyang-061X / VL-Uncertainty

Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
https://vl-uncertainty.github.io/
3 stars 0 forks source link

Great work! #1

Open linzhiqiu opened 6 days ago

linzhiqiu commented 6 days ago

Hey,

I am Zhiqiu Lin, a final-year PhD student at Carnegie Mellon University working with Prof. Deva Ramanan. Your work is very interesting with great performance gains!

I wanted to share NaturalBench (NeurIPS'24 D&B) in case you are looking for more benchmarks:

NaturalBench (https://linzhiqiu.github.io/papers/naturalbench/) is a vision-centric benchmark that challenges vision-language models with pairs of simple questions about natural imagery. Unlike prior VQA benchmarks (like MME and ScienceQA), which blind language models (e.g., GPT-3.5) can solve, NaturalBench ensures such shortcuts won’t work. We evaluated 53 state-of-the-art models, and even top models like GPT-4o and Qwen2-VL fall 50%-70% short of human accuracy (90%+), revealing significant room for improvement.

We also found that current models show strong answer biases, such as favoring “Yes” over “No” regardless of the input. Correcting these biases can boost performance by 2-3x, even for GPT-4o, making NaturalBench a valuable testbed for future debiasing techniques.

Check out my Twitter post about it here: https://x.com/ZhiqiuLin/status/1848454555341885808.

🚀 Start using NaturalBench: https://github.com/Baiqi-Li/NaturalBench

Best, Zhiqiu

Ruiyang-061X commented 4 days ago

Hi Zhiqiu,

Thank you for your interest in our work! I find NaturalBench to be an excellent contribution, and I believe its focus can positively impact the field of LVLM evaluation. I plan to incorporate NaturalBench into our work and benchmark VL-Uncertainty with it. We will be updating our paper soon (to include the appendix), and I will ensure to properly reference NaturalBench in it as well. Thank you again for your interest!

Best regards, Ruiyang