THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

Can not run webshop task correctly #68

Closed lynneChan closed 8 months ago

lynneChan commented 8 months ago

For my environment can not run docker container, so I can only use the code in branch v0.1. And I following your instruction and installed all the dependent libraries. And also set the en environment for webshop by running the script src/tasks/webshop/setup.sh. After all this done, when runing the eval.py for configs/tasks/webshop/dev.yaml, there are some errors occurs:

Keys cleaned.
Attributes loaded.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 40030.39it/s]
0 skipped
Loaded 13 goals.
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/multiprocess/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.10/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/notebook/code/personal/LLM/AgentBench/src/tasks/webshop/__init__.py", line 42, in predict
    env.reset(data_item)
  File "/home/notebook/code/personal/LLM/AgentBench/src/tasks/webshop/web_agent_site/envs/web_agent_text_env.py", line 251, in reset
    self.browser.get(init_url, session_id=self.session, session_int=session_int)
  File "/home/notebook/code/personal/LLM/AgentBench/src/tasks/webshop/web_agent_site/envs/web_agent_text_env.py", line 619, in get
    self.server.receive(self.session_id, self.current_url, session_int=session_int)
  File "/home/notebook/code/personal/LLM/AgentBench/src/tasks/webshop/web_agent_site/envs/web_agent_text_env.py", line 514, in receive
    goal = self.goals[idx]
IndexError: list index out of range

The are only 13 goals which webshop loaded from data, but the dev.yaml need at least 280 goals. I also checked the code in branch v0.2, the code in webshop is nearly all the same, so how to fix this ?

zhc7 commented 8 months ago

Hi, could you please try to uncomment src/tasks/webshop/web_agent_site/utils.py L11 and L13?

lynneChan commented 8 months ago

That would fix the error and get 12087 goals, but if this changes the data used for evaluation and affects the results?

zhc7 commented 8 months ago

Actually, this is the actual data used for evaluation; the former isn't. The reason why it is initially configured to load partial data is that the code is directly adopted from https://github.com/princeton-nlp/WebShop, and those 13 goals are for demonstration purposes.

lynneChan commented 8 months ago

OK, thank for you reply! The code may need correction or add instruction to readme.