Closed hunterheiden closed 2 months ago
Hi! Thanks for this PR. This looks clean and well-formed.
But I have a suggestion, it's better to add a README.md file inside the lmms_eval/tasks/screenspot
folder. Since we are adding more and more tasks, we may gradually need to add a brief introduction for each task~
The form can be: https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/arc
Makes sense! I've generated a README for this task now. I provided the information mentioned in the form and added a bit about the metrics used as well. Let me know if the formatting should be different or if this suffices
This PR adds the ScreenSpot benchmark from the SeeClick paper. This benchmark was designed for grounding (i.e., referring expression comprehension (REC)), but can also be inverted to yield an instruction generation (i.e., referring expression generation (REG)) benchmark. The data domain is screens: mobile, desktop, and web screenshots.
This PR simply adds ScreenSpot, mirroring RefCOCO style datasets for REC/REG.
Attached results.json shows that LLaVA-v1.5-7B achieves: