[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
While all of this information is nice and all, is there a comparison to experiments without grounding? It can be argued that grounding may hurt performance without knowing what the performance without grounding is.
While all of this information is nice and all, is there a comparison to experiments without grounding? It can be argued that grounding may hurt performance without knowing what the performance without grounding is.