Open ibingzhaoi opened 5 months ago
Hi, @ibingzhaoi 你是在Mac上跑的吗?如果是的话可能是因为https://github.com/THUDM/AgentBench/issues/84#issuecomment-1872249318
您好,我遇到了一样的问题。在运行os-std时,目前除了dbbench能够正常运行以外,其他任务我都无法运行。
错误包含:os-std的"task error"、kg的“AGENT_FAILED”
——————————————————————————
以下是os的run.json中的输出。
{"index": "std-007-bootstrap-00082", "error": null, "info": null, "output": {"index": "std-007-bootstrap-00082", "status": "task error", "result": "Traceback (most recent call last):\n File \"G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\agentbench\src\server\task_worker.py\", line 108, in task_start_sample_wrapper\n result = await self.task.start_sample(index, session)\n File \"G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\agentbench\src\server\tasks\os_interaction\task.py\", line 362, in start_sample\n container = Container(config.image)\n File \"G:\\u674e\u67ef\u8fb0\\u6c5f\u82cf\u9716\u627f\u79d1\u6280\u6709\u9650\u516c\u53f8\\u5f00\u6e90\agentbench\src\server\tasks\os_interaction\task.py\", line 37, in init\n self.sock = self.client.api.exec_start(self.exec_id, socket=True)._sock\nAttributeError: 'NpipeSocket' object has no attribute '_sock'\n", "history": []}, "time": {"timestamp": 1708589987869, "str": "2024-02-22 16:19:47"}}
这是docker的情况
————————————————————
以下是运行kg时,遇到的报错。
python -m src.start_task -a
INFO: Started server process [38924]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:5000 (Press CTRL+C to quit)
INFO: 127.0.0.1:29878 - "GET /api/list_workers HTTP/1.1" 200 OK
Traceback (most recent call last):
File "C:\Users\78523\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\78523\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "G:\agentbench\src\start_task.py", line 129, in
@ibingzhaoi @zhc7 @cenyk1230 @Btlmd
Hi, @Joe-2002 我们可以在另外一个issue里讨论。报错的位置其实是一个依赖于系统的实现,在linux上运行不会出现问题。AgentFailed可能是由于与agent服务器通信不畅
对于db任务,经常出现AGENT_FAILED导致输出不了over_all文件,请问如何手动计算overall_cat_accuracy
Hi, @qzd-1 可以手动读一下runs.jsonl里成功运行里每次的结果,然后统计一下准确率。cat指的是categorical,也就是每类(SELECT,INSERT,UPDATE)分别统计准确率然后以相同的权重取平均。
Describe the bug A clear and concise description of what the bug is.
请问这几个在Ubuntu上是不是有问题? Docker能启动,可是GPT4/GPT3全失败了 longinyu/agentbench-ltp longinyu/agentbench-mind2web longinyu/agentbench-card_game longinyu/agentbench-alfworld
Error for cg as below: {"index": 9, "error": null, "info": null, "output": {"index": 9, "status": "task error", "result": "Traceback (most recent call last):\n File \"/root/workspace/src/server/task_worker.py\", line 108, in task_start_sample_wrapper\n result = await self.task.start_sample(index, session)\n File \"/root/workspace/src/server/tasks/card_game/task.py\", line 134, in start_sample\n await task\n File \"/root/workspace/src/server/tasks/card_game/server.py\", line 29, in start\n dat a = client_socket.recv(1000000).decode()\nUnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 4: invalid continuation byte\n", "history": []}, "time": {"timestamp": 1706327091710, "str": "2024-01-27 03:44:51"}}
{"index": 18, "error": "START_FAILED", "info": "{\"detail\":\"Error: Worker not responding\n\"}", "output": null, "time": {"timestamp": 1706325926451, "str": "2024-01-27 03:25:26"}}
是什么地方设置错误了么