DavidMChan / aloha

A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
5 stars 0 forks source link

question about table 2 (FOIL, nocaps-FOIL) #2

Closed long8v closed 3 months ago

long8v commented 3 months ago

Hello, there. I have question about table 2. I am confused AP in Table 2 is accuracy whether metric(ALOHa, CLIPScore) would give better score to "not FOIL" one over FOIL, following procedure in CLIPScore paper https://arxiv.org/pdf/2104.08718 section 4.4.

(section 4.4 of CLIPScore: A Reference-free Evaluation Metric for Image Captioning)

we sample a (FOIL, true) pair, and compute the accuracy of each evaluation metric in their capacity to assign a higher score to the true candidate versus the FOIL.

long8v commented 3 months ago

I found it is AP not accuracy, issue closed!