Closed bigbrother001 closed 5 months ago
The reviewer's comment means that our model is good at doing the Discriminative task (Yes/no). LLaVA and minigpt4 always answer 'yes' for most of the questions while our model is more honest and know when to answer 'no'. If the question is not a Discriminative task like summarization task or multi-choice QA, our model may not address them very well. For examples, if the question is "Describe this image."
The answer will be "The scene takes place in an urban area, where a bicycle is parked next to a trash bin. The bicycle is situated on the right side of the image, with its front wheels slightly turned. There are several other people in this area, walking and standing around at various distances from the trash bin."
If there are no people in the image, that's the general hallucination problem.
Thank you very much for your explanation, I think I understand it now.
One of the reviewers comment that, this work addresses "yes or no" hallucination instead of a general hallucination problem, e.g., hallucination in captions. I'm not very clear about his/her comment, could you tell that what are the "general hallucination problem" and "hallucination in captions." mean? Expecting to hear your opinions.