ServiceNow / WorkArena

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
https://servicenow.github.io/WorkArena/
Other
117 stars 10 forks source link

Error in the cheat function (oracle solver) #28

Open dgjun32 opened 1 month ago

dgjun32 commented 1 month ago

I observed error of oracle in several task types: mainly in sort & filter task types in List category.

For example, in 18 th episode of workarena.servicenow.sort-user-list task type, action sequence proposed by oracle is:

click('a46') -> click('a1005') -> select_option('a1183', 'Department') -> select_option('a1240', 'descending') -> click('a1005').

However, the strange thing is that there is no element corresponding to bid 'a1183' in third observation, and also when running this action sequence in Workarena environment, execution error occurs (as a1183 does not exists in the third observation).

dgjun32 commented 1 week ago

Hello, is there any update on this issue?

aldro61 commented 1 week ago

Hi @dgjun32, I'm not sure what you are referring to. The Oracle functions (.cheat) rely on Playwright code, not BrowserGym primitives. Can you provide more context? Where are you facing this problem?

optimass commented 1 week ago

Hi @dgjun32,

There's no guarantee that the bids used by the Oracle will also be present in the AX tree that the agent observes.

hope this helps!

dgjun32 commented 1 week ago

Hi @aldro61, I am referring to the code in scripts/extract_finetuning_traces.py.

def extract_trace(task_cls, headless=True):
    """
    Extracts the trace of actions and observations for a given task.

    Parameters:
    ------------
    task_cls: class
        The class of the task to extract the trace from.

    """
    # Instantiate a new environment
    env = BrowserEnv(task_entrypoint=task_cls, headless=headless, slow_mo=1000)

    # Setup customized tracing
    trace = []
    monkey_patch_playwright(observation_callback=env._get_obs, trace_storage=trace)

    env.reset()
    env.task.cheat(env.page, env.chat.messages)
    env.close()

    return trace
dgjun32 commented 1 week ago

@optimass, Thanks for the answer. I am trying to extract a oracle demonstration for fine-tuning the LLM agent. However, if bids used by the Oracle is not present in the AX tree that the agent observes, then is it a valid demonstration? For example, oracle action at time step t is click(bid='a1300'), and the bid 'a1300' does not exist in the observation at time step t, then what does the 'a1300' means?

optimass commented 1 week ago

@gasse can you clarify my point please?

gasse commented 6 days ago

Hi all. Yes that's something we've already identified. Basically the oracle (cheat() function) might click on generic elements like div and span, but these elements are not rendered by the flatten_axtree_to_str() method by default. So you don't see their bids in the AXTree.

@dgjun32 can you try printing the AXTree with skip_generic=False instead? I think then you should see the missing bids. https://github.com/ServiceNow/BrowserGym/blob/89eccd555f413112c4e7687b5395ffd1f2c00862/browsergym/core/src/browsergym/utils/obs.py#L287