What is the meaning of an empty `pos_candidate`?

OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Other

571 stars 69 forks source link

There are 761 rows in the HuggingFace dataset osunlp/Multimodal-Mind2Web that have an empty pos_candidate.

The rows span across 497 tasks:

{'test_domain': 164, 'test_task': 47, 'test_website': 34, 'train': 252}

Here's a sample task that has an empty pos_candidate in one of the steps: https://huggingface.co/datasets/osunlp/Multimodal-Mind2Web/viewer/default/train?q=6687eb6c-7154-4176-83a8-e841f78089d9 (row=1659)

It appears that src/data_utils/evaluation_utils.py and src/offline_experiments/screenshot_generation/*.py assume that an empty pos_candidates implies the failure of the agent, and since "A task is regarded as successful only if all steps have succeeded," there could be a lack of clarity on what the accuracy gap of the "whole success rate" means in Table 4.

OSU-NLP-Group / SeeAct

What is the meaning of an empty `pos_candidate`? #43