FuxiaoLiu / LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
https://fuxiaoliu.github.io/LRV/
BSD 3-Clause "New" or "Revised" License
254 stars 13 forks source link

A question about "hallucination in captions" #23

Closed bigbrother001 closed 5 months ago

bigbrother001 commented 5 months ago

One of the reviewers comment that, this work addresses "yes or no" hallucination instead of a general hallucination problem, e.g., hallucination in captions. I'm not very clear about his/her comment, could you tell that what are the "general hallucination problem" and "hallucination in captions." mean? Expecting to hear your opinions.

FuxiaoLiu commented 5 months ago

The reviewer's comment means that our model is good at doing the Discriminative task (Yes/no). LLaVA and minigpt4 always answer 'yes' for most of the questions while our model is more honest and know when to answer 'no'. If the question is not a Discriminative task like summarization task or multi-choice QA, our model may not address them very well. For examples, if the question is "Describe this image."

The answer will be "The scene takes place in an urban area, where a bicycle is parked next to a trash bin. The bicycle is situated on the right side of the image, with its front wheels slightly turned. There are several other people in this area, walking and standing around at various distances from the trash bin."

If there are no people in the image, that's the general hallucination problem.

bigbrother001 commented 5 months ago

Thank you very much for your explanation, I think I understand it now.