OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

Evaluation for element attributes and image annotations #13

Closed cc13qq closed 6 months ago

cc13qq commented 7 months ago

Thank you for your newly updated files! I have successfully generated screenshots.

However, I'm still curious about how to evaluate the element attributes and image annotations since the outputs of SeeAct are not predicted choices. I didn't find the ground truth of these two splits.

Are you also planning to release the evaluation code for element attributes and image annotations?

Look forward to your reply.

boyugou commented 7 months ago

The evaluation for image_annotation is identical to text_choice, since they are using the same choice candidates. Regarding element_attributes, we have also described in the paper. You might also freely improve the search algorithm of element_attributes applied in the paper, or combine it with other grounding strategies. (See more discussions I replied in #14)

For the future upgrades of SeeAct codebase, I replied in #14.

Thanks for your interest in our work!