[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
Moved the auto input template to a txt file and deleted the llm_engine and browser_helper function files as they are subsets of gpt4v_api and browser_helper_robust respectively #3