abhika-m / FAVA

55 stars 1 forks source link

Size of the test set (annotations.json) #3

Open qishenghu opened 6 months ago

qishenghu commented 6 months ago

Thanks for the good work.

According to the arxiv paper, there should be approximately 1,000 annotated response for FAVABENCH detection task. But seems like the 'annotations.json' file from this link (https://huggingface.co/datasets/fava-uw/fava-data/tree/main) contains only 460 records. Could you kindly help me understand which file might be the correct annotated response for FAVABENCH?

Thanks!

khunkin commented 6 months ago

The 'annotations.json' file from this link(submitted on Jan 15, 2024) should correspond to version 2(submitted on Jan 17, 2024) of the submission on arXiv, which states: "Our benchmark consists of about 400 responses of ChatGPT and Llama2-Chat 70B." And the version 3 (submitted on Feb 21, 2024) mentions: "...annotating approximately 1,000 responses of three widely used LMs." To date (Apr 18, 2024), the authors have not updated the data to the latest version. Hope that the authors will update the data.

Thanks!