TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
Other
89 stars 10 forks source link

Inconsistent POPE expected number of examples #15

Open iancovert opened 1 month ago

iancovert commented 1 month ago

Thank you for your work, this package has been very helpful! However, I noticed an issue with the expected number of examples for POPE when using the full version:

It seems like the best solution would be to use the most up-to-date POPE dataset, and then update this line from the eval harness to reflect the correct number of examples. I can make a PR if that sounds right, but I thought I'd run it by you first.