[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
I wanted to visualize how the model action on the Mind2Web dataset, but SeeAct didn't seem to do that.
When computing online, the output "success_or_not" is always empty, which means that success is not known.
Offline evaluation does not allow you to visualize model actions.
Despite the Action_history, it is not readable by humans. So can we make SeeAct support the actions of the visual model? Or tell me which user implemented that feature in their code.
I wanted to visualize how the model action on the Mind2Web dataset, but SeeAct didn't seem to do that. When computing online, the output "success_or_not" is always empty, which means that success is not known. Offline evaluation does not allow you to visualize model actions.
Despite the Action_history, it is not readable by humans. So can we make SeeAct support the actions of the visual model? Or tell me which user implemented that feature in their code.