OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

som branch has duplicate overlays #39

Closed mlin12321 closed 2 months ago

mlin12321 commented 2 months ago

The som branch (commit 6900592) creates duplicate element labels for certain actions on certain websites. For example, for tasks that involve scrolling down on twitter, the labels will still persist. This appears to be the case for actions on the same page (e.g. scrolling)

duz-sg commented 2 months ago

This issue is because the marks are cleared too late (before next round of mark generation), therefore, whenever the page changes (scroll down), the old marks remains. The latest code will clear the marks before any action, it should solve this issue.

mlin12321 commented 2 months ago

That seems to fix it. Thanks!