THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

[Bug/Assistance] webshop failed #56

Closed nlpcat closed 8 months ago

nlpcat commented 9 months ago

when we add the webshop-dev to the start_task.yaml

definition:
  import: tasks/task_assembly.yaml

start:
  dbbench-std: 5
  os-std: 5
  webshop-std: 5
Traceback (most recent call last):
  File "/root/miniconda3/envs/webshop/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/webshop/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/workspace/src/server/task_worker.py", line 256, in <module>
    asyncio_task = InstanceFactory.parse_obj(conf[args.name]).create()
  File "/root/workspace/src/typings/general.py", line 34, in create
    mod = __import__(path, fromlist=[self.module.split(".")[-1]])
ModuleNotFoundError: No module named 'src.server.tasks.webshop_docker'

this seems to be caused by this ln: failed to create symbolic link '/root/workspace/src/server/tasks/webshop_docker': File exists I checked the docker image it already has /root/webshop so the ln -s command won't work

nlpcat commented 9 months ago

kg-std is actually not in start_task.yaml. so the readme might be kind of confusing so as the issue in the assignments/default.yaml.

nlpcat commented 9 months ago

@Longin-Yu @zhc7

zhc7 commented 9 months ago

kg-std is actually not in start_task.yaml. so the readme might be kind of confusing so as the issue in the assignments/default.yaml.

Thanks for pointing out. We've udpated README.

zhc7 commented 9 months ago

when we add the webshop-dev to the start_task.yaml

definition:
  import: tasks/task_assembly.yaml

start:
  dbbench-std: 5
  os-std: 5
  webshop-std: 5
Traceback (most recent call last):
  File "/root/miniconda3/envs/webshop/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/webshop/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/workspace/src/server/task_worker.py", line 256, in <module>
    asyncio_task = InstanceFactory.parse_obj(conf[args.name]).create()
  File "/root/workspace/src/typings/general.py", line 34, in create
    mod = __import__(path, fromlist=[self.module.split(".")[-1]])
ModuleNotFoundError: No module named 'src.server.tasks.webshop_docker'

this seems to be caused by this ln: failed to create symbolic link '/root/workspace/src/server/tasks/webshop_docker': File exists I checked the docker image it already has /root/webshop so the ln -s command won't work

Would you try removing src/server/tasks/webshop_docker and start it again?

nlpcat commented 9 months ago

@zhc7 yes. it works after I removed src/server/tasks/webshop_docker file

nlpcat commented 9 months ago

@zhc7 there seems to be a bug if we assign webshop-dev but webshop-std can work.

raise AgentBenchException(result.text, result.status_code, self.name)
src.typings.exception.AgentBenchException: ('{"detail":"Error: Task does not exist"}', 400, 'webshop-dev')
zhc7 commented 9 months ago

@zhc7 yes. it works after I removed src/server/tasks/webshop_docker file

Thanks. I'll remove this file from the repo.

zhc7 commented 9 months ago

@zhc7 there seems to be a bug if we assign webshop-dev but webshop-std can work.

raise AgentBenchException(result.text, result.status_code, self.name)
src.typings.exception.AgentBenchException: ('{"detail":"Error: Task does not exist"}', 400, 'webshop-dev')

Have you started webshop-dev task_worker? Note that webshop-std and webshop-dev are independent. Try executing python -m src.start_task -s webshop-dev -p 7000 to manually start a task worker or add webshop-dev to start_task.yaml.

nlpcat commented 9 months ago

yes. it works after changing the start_task.yaml.

but ltp task failed with this error File "/root/workspace/src/client/agents/claude_agent.py", line 1, in <module> import anthropic ModuleNotFoundError: No module named 'anthropic'. the docker image might need to get updated? I solved it by adding this to ltp.yaml command: pip install anthropic;

however, the card game task can't work. it mentioned the worker is not responding. can you help test the latest code and image for card game? thanks. cc @zhc7

Longin-Yu commented 9 months ago

We've just uncovered this issue, and it may take some time to pinpoint it due to the complexity of the card_game task.

nlpcat commented 9 months ago

@Longin-Yu may I ask when we can fix the card game test? thanks

Longin-Yu commented 9 months ago

We have located this bug and we are working on fixing it. We expect an update within this week.

zhc7 commented 9 months ago

@nlpcat Hi, we've fixed the problem in a069c7c. Welcome to try!

Longin-Yu commented 8 months ago

Feel free to reopen this issue if it persists.