BradyFU / Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
593 stars 29 forks source link

GroundingDINO's high false positive rate make it difficult to detect obj hallucination #13

Open ipheiman opened 3 weeks ago

ipheiman commented 3 weeks ago

Hi authors, Amazing work on post-hoc hallucination mitigation, I am excited to try this out!

I am looking at the individual steps and experimenting with GroundingDINO but found that it has the tendency to give false positives, which is counter-intuitive for flagging out hallucinated objects. This issue is also raised in GroundingDino's repo: https://github.com/IDEA-Research/GroundingDINO/issues/84

I was wondering if you encountered something similar when developing your work. It would be beneficial to hear your opinion on this, thanks!!

xjtupanda commented 3 weeks ago

Thanks for your attention to our work!

We've met similar issues but haven't figured out a perfect solution for this since this is a problem inherent to the detection model. But most of the time, GroundingDINO worked just fine.

I think this could be a gap where you make improvements. You may try using a VQA model (e.g., BLIP-2) to double-check if something really exists. Or revise the pipeline, like performing a global detection first (like this one recognize-anything) and then checking if the object matches anything in those tags.