CRIPAC-DIG / LogicCheckGPT

[ACL 2024] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models. Detect and mitigate object hallucinations in LVLMs by itself through logical closed loops.
16 stars 1 forks source link

About baseline #2

Open haohaodw opened 3 months ago

haohaodw commented 3 months ago

A nice work. I would like to ask a question about LURE. LURE needs to mask the object during inference and then correct it. However, POPE and MME are discriminant tasks, using YES/NO to answer questions. How do you test the performance of LURE on these two data sets?

Hyperwjf commented 3 months ago

Thanks for your interest! In our experiments, we have observed that the responses from the four LVLMs to POPE questions are in the format as "Yes/No, there is/isn't {object} ..." This format allows LURE to mask the object. For instance, the responses of mPLUG-Owl to some POPE questions are listed below:

issue_1

The responses of LLaVA-1.5 to some POPE questions are listed below:

issue_2

haohaodw commented 3 months ago

However, when calculating the accuracy of POPE, the calculation is yes or no. So how do you judge whether the modified response of LURE is correct?