ServiceNow / WorkArena

WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
https://servicenow.github.io/WorkArena/
Other
122 stars 11 forks source link

Error in task.validate() #23

Closed dgjun32 closed 2 months ago

dgjun32 commented 2 months ago

I am experimenting with gpt-4o agent, but I encountered error when calling env.step().

obs, reward, terminated, truncated, info = env.step(action)
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 56, in step
    return self.env.step(action)
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/gymnasium/wrappers/env_checker.py", line 51, in step
    return self.env.step(action)
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/browsergym/core/env.py", line 374, in step
    reward, done, user_message, task_info = self._task_validate()
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/browsergym/core/env.py", line 399, in _task_validate
    reward, done, user_message, info = self.task.validate(self.page, self.chat.messages)
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/browsergym/workarena/tasks/dashboard.py", line 464, in validate
    _, chart_data, _ = self._get_chart_by_title(page, self.config["chart_title"])
  File "/home/dongjun/anaconda3/envs/workarena/lib/python3.10/site-packages/browsergym/workarena/tasks/dashboard.py", line 209, in _get_chart_by_title
    title = charts[0][0]
IndexError: list index out of range

This is the 6th episode of single-chart-value-retrieval task. It seems there is some problem in task.validate() function. Have you ever seen this error?

aldro61 commented 2 months ago

I've encountered this error sporadically in the past. Usually, this happens because of network latency which causes the page to load slowly. Is it possible that your connection is slow or that you are running a large number of parallel evaluations on the same instance? Parallel evaluation works, but if you start running into latency issues, try to reduce the number of concurrent jobs.

dgjun32 commented 2 months ago

I did not run multiple evaluations, so I think the network connection might be the problem.

aldro61 commented 2 months ago

Could you please try again and make sure the internet connection is fast (e.g., with a speedtest)? Let me know if the issue persists. Thanks!

dgjun32 commented 2 months ago

Is there any speediest criteria for running this environment? It seems connection speed is around 8MB / second!

aldro61 commented 2 months ago

That sounds like decent speed. What we typically do when we get timeout errors is to restart the episode. These are inevitable since the benchmark is remote-hosted, but they tend to occur very rarely. To help me understand what's going on, could you please share the URL and password of the problematic instance, and also report the output of the ping command for the instance URL?

Thanks!