THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.01k stars 136 forks source link

[Bug/Assistance] os-std某一条数据报错Worker not responding #105

Open Xccanxin opened 5 months ago

Xccanxin commented 5 months ago

Q:os数据集中os-std-003-ac-00000数据已经重复尝试几次,一直提示Worker not responding,其余143条数据均能正常评估,报错信息如下 Warning: chatglm2-6b/os-std#std-003-ac-00000 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"} index=None status=<SampleStatus.RUNNING: 'running'> result=None history=None

模型:chatglm2 + fastchat

已尝试将start_task.yaml中的worker数调成1或5,报错不变 start_task.yaml

definition:
  import: tasks/task_assembly.yaml

start:
  os-std: 1

default.yaml

import: definition.yaml

concurrency:
  task:
    os-std: 1
  agent:
    chatglm2-6b: 1

assignments: # List[Assignment] | Assignment
  - agent: # "task": List[str] | str ,  "agent": List[str] | str
      - chatglm2-6b
    task:
      - os-std

output: "outputs/{TIMESTAMP}"

报错数据:

    {
        "description": "Tell me the number of CPUs.",
        "evaluation": {
            "check": [
                null,
                {
                    "language": "python",
                    "file": "check/integer-match.py"
                }
            ],
            "example": "nproc"
        },
        "labels": [
            "command",
            "CPU",
            "device",
            "hardware",
            "processor",
            "system"
        ]
    },
zhc7 commented 5 months ago

Hi, @Xccanxin 这种情况有可能是由于worker卡死了,有可能agent给出的指令包含无限循环或者非常长时间的等待