OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
https://osu-nlp-group.github.io/SeeAct/
Other
571 stars 69 forks source link

Bug Report: Missing Previous Actions in MM-Mind2web Dataset #32

Closed leoozy closed 3 months ago

leoozy commented 3 months ago

Thank you very much for your work. I have found a potential bug in your MM-Mind2web model. It seems that each data point only contains a list of selectable actions without any previous actions. This could lead to issues during evaluation.

boyuanzheng010 commented 3 months ago

Thanks @leoozy. The previous actions could be mapped from action_reprs, which is the full set of actions for a certain task. MMind2web was originally designed to make step-wise actions easier, so didn't add the orders.

We have updated the Multimodal-Mind2Web in Huggingface: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web Each row of data comes with target_action_index for the index of the target action and the corresponding action representation string in target_action_reprs.