TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
Other
64 stars 7 forks source link

About the number of POPE dataset #9

Closed Hannibal046 closed 2 months ago

Hannibal046 commented 2 months ago

Maybe we should change from 9000 to 8910? https://github.com/TRI-ML/vlm-evaluation/blob/2092905d392e8dbedf01ed4b853df530e3cf9f35/vlm_eval/conf/datasets.py#L323-L324

https://github.com/TRI-ML/vlm-evaluation/blob/2092905d392e8dbedf01ed4b853df530e3cf9f35/vlm_eval/tasks/harnesses/pope.py#L58-L60

Hannibal046 commented 2 months ago

Duplicate: https://github.com/TRI-ML/vlm-evaluation/issues/2