Can an online assessment yield a score, or can the process of an offline assessment be visualized?

OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

https://osu-nlp-group.github.io/SeeAct/

Other

571 stars 69 forks source link

Can an online assessment yield a score, or can the process of an offline assessment be visualized? #22

Open Tangent-90C opened 4 months ago

Tangent-90C commented 4 months ago

1712585243173

I wanted to visualize how the model action on the Mind2Web dataset, but SeeAct didn't seem to do that. When computing online, the output "success_or_not" is always empty, which means that success is not known. Offline evaluation does not allow you to visualize model actions.

Despite the Action_history, it is not readable by humans. So can we make SeeAct support the actions of the visual model? Or tell me which user implemented that feature in their code.