Size of the test set (annotations.json)

The 'annotations.json' file from this link(submitted on Jan 15, 2024) should correspond to version 2(submitted on Jan 17, 2024) of the submission on arXiv, which states: "Our benchmark consists of about 400 responses of ChatGPT and Llama2-Chat 70B." And the version 3 (submitted on Feb 21, 2024) mentions: "...annotating approximately 1,000 responses of three widely used LMs." To date (Apr 18, 2024), the authors have not updated the data to the latest version. Hope that the authors will update the data.

Thanks!

abhika-m / FAVA

Size of the test set (annotations.json) #3