DavidMChan / aloha

A new reliable, localizable, and generalizable metric for hallucination detection in image captioning models.
5 stars 0 forks source link

plan for releasing dataset (HAT, nocaps-FOIL) #1

Closed long8v closed 3 months ago

long8v commented 3 months ago

Thank you for your great work! I was wondering if you might have any plans to release the test set for the HAT and nocaps-FOIL datasets? Access to those test sets would be incredibly helpful for the research I'm currently conducting. I would greatly appreciate any information you're able to provide. Thank you in advance for your consideration.

DavidMChan commented 3 months ago

Thanks for reaching out!

The HAT trainval set which we used for the paper is already available here: https://github.com/DavidMChan/aloha/blob/main/data/hat-trainval.json

The nocaps FOIL validation dataset which we used for the paper is available here: nocaps-val-foil.json

Feel free to reach out if there are other dataset artifacts you are interested in!

long8v commented 3 months ago

Thank you for super quick response! 😄 In addition, I was wondering if it would be possible to also obtain the test sets for the HAT and nocaps-FOIL datasets. Based on the table captions, it seems Table 1 refers to the HAT test set, and Table 2 likely uses the test sets as well. (since FOIL dataset does not have valid split as far as I know)

DavidMChan commented 3 months ago

Sorry for the confusion. To clarify, these two files are the test sets used in the paper - it is just a naming inconsistency (as there is additional HAT data that we are not releasing at this time).

On Thu, Jun 13 2024 at 17:45, JeongYeon Nam < @.*** > wrote:

Thank you for super quick response! 😄 In addition, I was wondering if it would be possible to also obtain the test sets for the HAT and nocaps-FOIL datasets. Based on the table captions, it seems Table 1 refers to the HAT test set, and Table 2 likely uses the test sets as well. (since FOIL dataset does not have valid split as far as I know)

— Reply to this email directly, view it on GitHub ( https://github.com/DavidMChan/aloha/issues/1#issuecomment-2167014023 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAYK3IQ7MYVL45MT3OZCSFDZHI4KRAVCNFSM6AAAAABJH7VTBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRXGAYTIMBSGM ). You are receiving this because you modified the open/close state. Message ID: <DavidMChan/aloha/issues/1/2167014023 @ github. com>

long8v commented 3 months ago

I think nocaps-val-foil.json you provided in issue is the data used in paper, which has 2500 instances. but https://github.com/DavidMChan/aloha/blob/main/data/hat-trainval.json dataset only has 82 instance when I open it, while paper says test set 400 instances. Can you check whether I went wrong?

DavidMChan commented 3 months ago

Whoops! Looks like I uploaded a subset of the full dataset. The correct set is here (and I've updated the released JSONs):

hat-trainval.json

long8v commented 3 months ago

Thank you so much!!!! 😄

long8v commented 3 months ago

Just to be clear, to estimate AP, is it right to use nocaps-val-foil.json once with "foil" and once with "baseline"(not foil) so that ratio between FOIL and not FOIL would be 1:1?