OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

Mistake or workaround? #34

Closed manuel-delverme closed 3 months ago

manuel-delverme commented 3 months ago

https://github.com/OSU-NLP-Group/SeeAct/blob/fcba5ba2fe6a961bb5696e70aa829db68ce500df/seeact_package/seeact/agent.py#L406-L407

Why do we need to call fill twice?

boyugou commented 3 months ago

On some websites' some elements, I don't know why, but it will not actually fill in the text if only do it once.

(It will be great if someone can tell me why. I remember what I tested is the zip code area of thumbtack)