THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

无法正常启动,访问task会报错 #53

Closed Dhaizei closed 7 months ago

Dhaizei commented 9 months ago

INFO: 127.0.0.1:45654 - "GET /api/get_indices?name=dbbench-std HTTP/1.1" 200 OK INFO: 127.0.0.1:45656 - "GET /api/get_indices?name=os-std HTTP/1.1" 400 Bad Request

在python -m src.start_task -a 后(未进行任何改动配置)

<class 'src.server.tasks.os_interaction.task.OSInteraction'> Traceback (most recent call last): File "/root/anaconda3/envs/py38/lib/python3.8/runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/py38/lib/python3.8/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/work/AgentBenchV0.2/src/server/task_worker.py", line 256, in asyncio_task = InstanceFactory.parse_obj(conf[args.name]).create() File "/root/work/AgentBenchV0.2/src/typings/general.py", line 37, in create return getattr(mod, self.module.split(".")[-1])(**self.parameters) File "/root/work/AgentBenchV0.2/src/server/tasks/os_interaction/task.py", line 275, in init

python -m src.assigner 后 访问os-std就会报错

<class 'src.client.task.TaskClient'> TaskClient created: os-std (http://localhost:5000/api) Traceback (most recent call last): File "/root/anaconda3/envs/py38/lib/python3.8/runpy.py", line 192, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/anaconda3/envs/py38/lib/python3.8/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/work/AgentBenchV0.2/src/assigner.py", line 402, in Assigner(value, args.retry).start() File "/root/work/AgentBenchV0.2/src/assigner.py", line 74, in init self.task_indices[task] = self.tasks[task].get_indices() File "/root/work/AgentBenchV0.2/src/client/task.py", line 31, in get_indices raise AgentBenchException(result.text, result.status_code, self.name) src.typings.exception.AgentBenchException: ('{"detail":"Error: Task does not exist"}', 400, 'os-std')

zhc7 commented 9 months ago

试试使用python3.9?报错是AttributeError: 'str' object has no attribute 'removesuffix',str的这个方法是在python3.9中加入的。

Dhaizei commented 9 months ago

十分感谢您的回答 目前我先跑一下v0.1版本,熟悉下您的杰作 但是,我执行完 bash scripts/build_docker.sh 后,都正常 但是执行 python src/tasks/os_interaction/images.py build -c configs/tasks/os_interaction/dev.yaml -r . 会在 ~/work/AgentBenchv0.1# python src/tasks/os_interaction/images.py build -c configs/tasks/os_interaction/dev.yaml -r . Building image: local-os/packages

会一直在这个状态,没有继续执行,一个下午都没有后续内容了 我该怎么办呢,大神。

我是在wsl2+ubuntu22.04上运行的,这个影响吗?

Longin-Yu commented 9 months ago

试试直接 build 能不能成功?

docker build -f data/os_interaction/res/dockerfiles/default data/os_interaction/res/dockerfiles
docker build -f data/os_interaction/res/dockerfiles/packages data/os_interaction/res/dockerfiles
docker build -f data/os_interaction/res/dockerfiles/ubuntu data/os_interaction/res/dockerfiles
Dhaizei commented 9 months ago

十分感谢您的回复,有空我试一下,给您回复,最近比较忙

---原始邮件--- 发件人: "Hao @.> 发送时间: 2023年10月18日(周三) 中午1:15 收件人: @.>; 抄送: @.**@.>; 主题: Re: [THUDM/AgentBench] 无法正常启动,访问task会报错 (Issue #53)

试试直接 build 能不能成功? docker build -f data/os_interaction/res/dockerfiles/default data/os_interaction/res/dockerfiles docker build -f data/os_interaction/res/dockerfiles/packages data/os_interaction/res/dockerfiles docker build -f data/os_interaction/res/dockerfiles/ubuntu data/os_interaction/res/dockerfiles

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Dhaizei commented 9 months ago

可以build成功

Dhaizei commented 9 months ago

我发现并没有alfworld的和docker进行build的文件

zhc7 commented 9 months ago

只有os需要提前build几个image,如果想跑alfworld的话,可以先docker pull longinyu/agentbench-alfworld,然后按照教程操作。

Dhaizei commented 9 months ago

好的,谢谢您

docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.

貌似是我的docker容器的权限问题

zhc7 commented 9 months ago

是的,需要用一个有运行docker的权限的账户

Dhaizei commented 9 months ago

我已经解决了,在stark_task.py文件下加入sudo即可,以下是我改的内容: subprocess.Popen( [ "sudo", "docker", "run", "--rm", "--network", "host", "-v", f"{project_root}:/root/workspace", "-w", "/root/workspace", docker["image"], "bash", "-c", docker.get("command", "") + f" python -m src.server.task_worker {name}" f" --self http://localhost:{port}/api" f" --port {port}" f" --controller {controller}", ] )

zhc7 commented 9 months ago

👍

Dhaizei commented 9 months ago

您能帮我解释一下, alfworld 的 json 文件吗?{ "pick_and_place": [ "json_2.1.1/valid_unseen/pick_and_place_simple-SoapBottle-None-Toilet-424/trial_T20190907_004404_604165/game.tw-pddl", "json_2.1.1/valid_unseen/pick_and_place_simple-Pencil-None-Shelf-308/trial_T20190908_122154_042763/game.tw-pddl", "json_2.1.1/valid_unseen/pick_and_place_simple-SaltShaker-None-Cabinet-10/trial_T20190906_191445_723170/game.tw-pddl", "json_2.1.1/valid_unseen/pick_and_place_simple-Mug-None-Desk-308/trial_T20190909_210238_431966/game.tw-pddl" ]} 他是在longinyu/agentbench-alfworld这个docker中吗?

Dhaizei commented 9 months ago

我已经查看到了,是在里面,谢谢~

Dhaizei commented 9 months ago

在alfworld-std任务中,在task.py文件
result = requests.post( self.controller_address + "/start_sample", json=StartSampleRequest(name=self.name, index=index).dict(), ) 输出结果为 {'session_id': 29, 'output': {'index': None, 'status': 'running', 'result': None, 'history': [{'role': 'user', 'content': 'Interact with a household to solve a task. Imagine you are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. For each of your turn, you will be given a list of actions which you can choose one to perform in this turn. You should choose from two actions: "THOUGHT" or "ACTION". If you choose "THOUGHT", you should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:"THOUGHT: your thoughts. \n ACTION: your next action\n"; If you choose "ACTION", you should directly output the action in this turn. Your output must strictly follow this format:"ACTION: your next action\n". After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output "Nothing happened", that means the previous action is invalid and you should try more options. \n Reminder: \n1. the action must be chosen from the given available actions. Any actions except provided available actions will be regarded as illegal. \n2. Think when necessary, try to act directly more in the process. \n\n'}, {'role': 'agent', 'content': "OK. I'll follow your instructions and try my best to solve the task. "}, {'role': 'user', 'content': '###Here is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nYou are in the middle of a room. Looking quickly around you, you see a coffeetable 1, a diningtable 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1. \nYour task is to: put two cellphone in sofa. \n'}, {'role': 'agent', 'content': "THOUGHT: The task is to put two cellphone in sofa, so i need to first find a cellphone and put it on sofa. I'll start my search on coffeetables. \n ACTION: go to coffeetable 1\n"}, {'role': 'user', 'content': 'On the coffeetable 1, you see a box 1, a cellphone 3, and a plate 1. \n'}, {'role': 'agent', 'content': 'ACTION: take cellphone 3 from coffeetable 1\n'}, {'role': 'user', 'content': 'You pick up the cellphone 3 from the coffeetable 1. \n'}, {'role': 'agent', 'content': 'ACTION: go to sofa 1\n'}, {'role': 'user', 'content': 'On the sofa 1, you see a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1. \n'}, {'role': 'agent', 'content': 'ACTION: put cellphone 3 in/on sofa 1\n'}, {'role': 'user', 'content': 'You put the cellphone 3 in/on the sofa 1. \n'}, {'role': 'agent', 'content': 'ACTION: go to diningtable 1\n'}, {'role': 'user', 'content': 'On the diningtable 1, you see a cellphone 2, a keychain 2, a laptop 1, a statue 2, and a statue 1. \n'}, {'role': 'agent', 'content': 'ACTION: take cellphone 2 from diningtable 1\n'}, {'role': 'user', 'content': 'You pick up the cellphone 2 from the diningtable 1. \n'}, {'role': 'agent', 'content': 'ACTION: go to sofa 1\n'}, {'role': 'user', 'content': 'On the sofa 1, you see a cellphone 3, a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1. \n'}, {'role': 'agent', 'content': 'ACTION: put cellphone 2 in/on sofa 1\n'}, {'role': 'user', 'content': 'You put the cellphone 2 in/on the sofa 1. \n'}, {'role': 'agent', 'content': 'ACTION: Task succeeded. \n'}, {'role': 'user', 'content': 'Here is your task. You are in the middle of a room. Looking quickly around you, you see a cabinet 6, a cabinet 5, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a coffeemachine 1, a countertop 3, a countertop 2, a countertop 1, a drawer 3, a drawer 2, a drawer 1, a fridge 1, a garbagecan 1, a microwave 1, a shelf 3, a shelf 2, a shelf 1, a sinkbasin 1, a stoveburner 4, a stoveburner 3, a stoveburner 2, a stoveburner 1, and a toaster 1. \nYour task is to: put two peppershaker in drawer. AVAILABLE ACTIONS: go to cabinet 1\ngo to cabinet 2\ngo to cabinet 3\ngo to cabinet 4\ngo to cabinet 5\ngo to cabinet 6\ngo to coffeemachine 1\ngo to countertop 1\ngo to countertop 2\ngo to countertop 3\ngo to drawer 1\ngo to drawer 2\ngo to drawer 3\ngo to fridge 1\ngo to garbagecan 1\ngo to microwave 1\ngo to shelf 1\ngo to shelf 2\ngo to shelf 3\ngo to sinkbasin 1\ngo to stoveburner 1\ngo to stoveburner 2\ngo to stoveburner 3\ngo to stoveburner 4\ngo to toaster 1\ninventory\nlook\n'}]}}

出现很多重复的内容 Here is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example. \nHere is one example.

应该是做的few react shot吧?

Longin-Yu commented 9 months ago

感谢提醒,现已修复。 错误定位:v0.2 中每个 Task Worker 同时处理多条 sample,而获取 one-shot template 的时候没有 copy,而是直接将 reference 拿过来修改,以至于重复添加了这一句话。

Dhaizei commented 9 months ago

在data中,dev和std的区别是什么呢?dev是原本数据的测试集吗?

Longin-Yu commented 9 months ago

dev 和 std(或者叫 test)的主要区别是数据数量不同(部分任务分布也会有些差异)。 论文以及 repo 中的测试结果都是放的 std (test) 集合,dev 提出的目的主要是供开发者训练模型时在短时间内测试性能的提升。

Dhaizei commented 9 months ago

也就是说,repo的论文中的测试结果,都是从std里面的数据进行测试出来的,dev适合短时间内进行测试性能提升,然后两者的数据内容不是包含与被包含的关系。感谢,有机会请大佬恰饭~

Longin-Yu commented 9 months ago

也就是说,repo的论文中的测试结果,都是从std里面的数据进行测试出来的,dev适合短时间内进行测试性能提升,然后两者的数据内容不是包含与被包含的关系。

是的。

感谢对我们工作的兴趣,欢迎持续关注!:sparkles:

Dhaizei commented 9 months ago

当我准备测试os的数据的适合,出现了以下内容:

Traceback (most recent call last): File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/user/lsj/AgentBench/src/server/task_worker.py", line 256, in asyncio_task = InstanceFactory.parse_obj(conf[args.name]).create() File "/home/user/lsj/AgentBench/src/typings/general.py", line 35, in create return getattr(mod, self.module.split(".")[-1])(self.parameters) File "/home/user/lsj/AgentBench/src/server/tasks/os_interaction/task.py", line 259, in init super().init(kwargs) TypeError: init() missing 1 required positional argument: 'name'

但是HH的测试是正常可以运行了

Longin-Yu commented 9 months ago

感谢提醒,config 中 os-dev 漏了 name 字段,现在已经修复

Dhaizei commented 9 months ago

TaskClient created: os-std (http://localhost:5000/api) Traceback (most recent call last): File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/user/lsj/AgentBench/src/assigner.py", line 425, in Assigner(value, args.retry).start(tqdm_out=orig_stdout) File "/home/user/lsj/AgentBench/src/assigner.py", line 94, in init self.task_indices[task] = self.tasks[task].get_indices() File "/home/user/lsj/AgentBench/src/client/task.py", line 31, in get_indices raise AgentBenchException(result.text, result.status_code, self.name) src.typings.exception.AgentBenchException: ('{"detail":"Error: Task does not exist"}', 400, 'os-std')

python -m src.start_task -a 貌似起了服务后,python -m src.assigner 找不到该task

zhc7 commented 9 months ago

问题应该是这样的:assigner试图去找os-std task,没有找到,也许你刚刚启动的是os-dev的服务而不是os-std的服务?可以检查一下configs/start_task.yaml里定义的是否是默认的os-std?

Dhaizei commented 9 months ago

是的,我都检查过,是我设置的os-std ,都是对应的

Dhaizei commented 9 months ago

我会将他的config打印出来,一一检查的。比如我刚才又测了一下: 这是python -m src.start_task -a conf: {'docker': {'command': 'umask 0; [ -f /root/.setup.sh ] && bash /root/.setup.sh;', 'image': 'longinyu/agentbench-alfworld'}, 'module': 'src.server.tasks.alfworld.ALFWorld', 'parameters': {'name': 'alfworld-dev', 'data_path': '/AgentBench/data/alfworld', 'config_path': 'src/server/tasks/alfworld/configs/base_config.yaml', 'prompts_path': 'src/server/tasks/alfworld/prompts/alfworld_multiturn_plan_first.json', 'split': 'dev', 'max_step': 35}}

这是python -m src.assigner

Warning: 7 agent(s) and 15 task(s) are defined but not used, they will be ignored. Agent: {'gpt-3.5-turbo-0613', 'text-davinci-002', 'vicuna-7b', 'vicuna-33b', 'vicuna-13b', 'text-davinci-003', 'wizard-30b'} Task: {'cg-dev', 'dbbench-std', 'ltp-std', 'dbbench-dev', 'webshop-dev', 'webshop-std', 'm2w-dev', 'ltp-dev', 'alfworld-std', 'kg-dev', 'cg-std', 'm2w-std', 'kg-std', 'os-dev', 'os-std'} creating alfworld-dev client... TaskClient created: alfworld-dev (http://localhost:5000/api) Traceback (most recent call last): File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/user/miniconda3/envs/agent-bench/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/user/lsj/AgentBench/src/assigner.py", line 425, in Assigner(value, args.retry).start(tqdm_out=orig_stdout) File "/home/user/lsj/AgentBench/src/assigner.py", line 94, in init self.task_indices[task] = self.tasks[task].get_indices() File "/home/user/lsj/AgentBench/src/client/task.py", line 31, in get_indices raise AgentBenchException(result.text, result.status_code, self.name) src.typings.exception.AgentBenchException: ('{"detail":"Error: Task does not exist"}', 400, 'alfworld-dev')

Longin-Yu commented 9 months ago

是否可以将你 configs/start_task.yaml 中的内容复制粘贴看一下?

Dhaizei commented 9 months ago

definition: import: tasks/task_assembly.yaml

start: alfworld-dev: 2

Longin-Yu commented 9 months ago

这个文件是 server 端的配置,start 字段下需要包含 os-std,例如:

definition:
  import: tasks/task_assembly.yaml

start:
  alfworld-dev: 2
  os-std: 5

上述配置文件将会启动共计 7 个 task worker(2 个 alfworld-dev 和 5 个 os-std)

Dhaizei commented 9 months ago

是的,这个我理解,我也是这么设置的。但是却无法找到该task,意外的是alfworld-std 是正常可以测评的

Dhaizei commented 9 months ago

image

Longin-Yu commented 9 months ago

重新启动 task server 试试(就是使用 python -m src.start_task -a 这个命令启动的进程)

Dhaizei commented 9 months ago

是的,我重复做了几次,都是如此,依旧检测不到alfworld-dev这个task

INFO: 127.0.0.1:57618 - "GET /api/get_indices?name=alfworld-dev HTTP/1.1" 400 Bad Request

Dhaizei commented 9 months ago

目前os-dev和os-std是正常可以进行评测了

Dhaizei commented 9 months ago

问题找到了,一般如果我开比较大的下载,会导致,部分进程无法绑定ip地址,所以就没有启动成功,给下载数据的东西关掉就好了,主要还是网络拥堵的问题。。。。

Dhaizei commented 9 months ago
 "history": [
            {
                "role": "user",
                "content": "Interact with a household to solve a task. Imagine you are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. For each of your turn, you will be given a list of actions which you can choose one to perform in this turn. You should choose from two actions: \"THOUGHT\" or \"ACTION\". If you choose \"THOUGHT\", you should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:\"THOUGHT: your thoughts.\n ACTION: your next action\n\"; If you choose \"ACTION\", you should directly output the action in this turn. Your output must strictly follow this format:\"ACTION: your next action\n\". After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output \"Nothing happened\", that means the previous action is invalid and you should try more options.\n Reminder: \n1. the action must be chosen from the given available actions. Any actions except provided available actions will be regarded as illegal. \n2. Think when necessary, try to act directly more in the process.\n\n"
            },
            {
                "role": "agent",
                "content": "OK. I'll follow your instructions and try my best to solve the task."
            },
            {
                "role": "user",
                "content": "Here is one example.\nYou are in the middle of a room. Looking quickly around you, you see a coffeetable 1, a diningtable 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1.\nYour task is to: put two cellphone in sofa.\n"
            },
            {
                "role": "agent",
                "content": "THOUGHT: The task is to put two cellphone in sofa, so i need to first find a cellphone and put it on sofa. I'll start my search on coffeetables.\n ACTION: go to coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "On the coffeetable 1, you see a box 1, a cellphone 3, and a plate 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 3 from coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 3 from the coffeetable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 3 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 3 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to diningtable 1\n"
            },
            {
                "role": "user",
                "content": "On the diningtable 1, you see a cellphone 2, a keychain 2, a laptop 1, a statue 2, and a statue 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 2 from diningtable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 2 from the diningtable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a cellphone 3, a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 2 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 2 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: Task succeeded.\n"
            },
            {
                "role": "user",
                "content": "Here is your task. You are in the middle of a room. Looking quickly around you, you see a armchair 1, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a drawer 5, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a safe 1, a shelf 12, a shelf 11, a shelf 10, a shelf 9, a shelf 8, a shelf 7, a shelf 6, a shelf 5, a shelf 4, a shelf 3, a shelf 2, a shelf 1, a sidetable 1, and a sofa 1.\nYour task is to: put two keychain in safe. "
            },
            {
                "role": "agent",
                "content": "ACTION: go to cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. AVAILABLE ACTIONS: examine cabinet 1\nexamine cabinet 2\ngo to armchair 1\ngo to cabinet 3\ngo to cabinet 4\ngo to drawer 1\ngo to drawer 2\ngo to drawer 3\ngo to drawer 4\ngo to drawer 5\ngo to dresser 1\ngo to garbagecan 1\ngo to safe 1\ngo to shelf 1\ngo to shelf 10\ngo to shelf 11\ngo to shelf 12\ngo to shelf 2\ngo to shelf 3\ngo to shelf 4\ngo to shelf 5\ngo to shelf 6\ngo to shelf 7\ngo to shelf 8\ngo to shelf 9\ngo to sidetable 1\ngo to sofa 1\ninventory\nlook\nopen cabinet 1\nopen cabinet 2\n"
            }
        ]        "history": [
            {
                "role": "user",
                "content": "Interact with a household to solve a task. Imagine you are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. For each of your turn, you will be given a list of actions which you can choose one to perform in this turn. You should choose from two actions: \"THOUGHT\" or \"ACTION\". If you choose \"THOUGHT\", you should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:\"THOUGHT: your thoughts.\n ACTION: your next action\n\"; If you choose \"ACTION\", you should directly output the action in this turn. Your output must strictly follow this format:\"ACTION: your next action\n\". After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output \"Nothing happened\", that means the previous action is invalid and you should try more options.\n Reminder: \n1. the action must be chosen from the given available actions. Any actions except provided available actions will be regarded as illegal. \n2. Think when necessary, try to act directly more in the process.\n\n"
            },
            {
                "role": "agent",
                "content": "OK. I'll follow your instructions and try my best to solve the task."
            },
            {
                "role": "user",
                "content": "Here is one example.\nYou are in the middle of a room. Looking quickly around you, you see a coffeetable 1, a diningtable 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1.\nYour task is to: put two cellphone in sofa.\n"
            },
            {
                "role": "agent",
                "content": "THOUGHT: The task is to put two cellphone in sofa, so i need to first find a cellphone and put it on sofa. I'll start my search on coffeetables.\n ACTION: go to coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "On the coffeetable 1, you see a box 1, a cellphone 3, and a plate 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 3 from coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 3 from the coffeetable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 3 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 3 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to diningtable 1\n"
            },
            {
                "role": "user",
                "content": "On the diningtable 1, you see a cellphone 2, a keychain 2, a laptop 1, a statue 2, and a statue 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 2 from diningtable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 2 from the diningtable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a cellphone 3, a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 2 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 2 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: Task succeeded.\n"
            },
            {
                "role": "user",
                "content": "Here is your task. You are in the middle of a room. Looking quickly around you, you see a armchair 1, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a drawer 5, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a safe 1, a shelf 12, a shelf 11, a shelf 10, a shelf 9, a shelf 8, a shelf 7, a shelf 6, a shelf 5, a shelf 4, a shelf 3, a shelf 2, a shelf 1, a sidetable 1, and a sofa 1.\nYour task is to: put two keychain in safe. "
            },
            {
                "role": "agent",
                "content": "ACTION: go to cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. AVAILABLE ACTIONS: examine cabinet 1\nexamine cabinet 2\ngo to armchair 1\ngo to cabinet 3\ngo to cabinet 4\ngo to drawer 1\ngo to drawer 2\ngo to drawer 3\ngo to drawer 4\ngo to drawer 5\ngo to dresser 1\ngo to garbagecan 1\ngo to safe 1\ngo to shelf 1\ngo to shelf 10\ngo to shelf 11\ngo to shelf 12\ngo to shelf 2\ngo to shelf 3\ngo to shelf 4\ngo to shelf 5\ngo to shelf 6\ngo to shelf 7\ngo to shelf 8\ngo to shelf 9\ngo to sidetable 1\ngo to sofa 1\ninventory\nlook\nopen cabinet 1\nopen cabinet 2\n"
            }
        ]        "history": [
            {
                "role": "user",
                "content": "Interact with a household to solve a task. Imagine you are an intelligent agent in a household environment and your target is to perform actions to complete the task goal. At the beginning of your interactions, you will be given the detailed description of the current environment and your goal to accomplish. For each of your turn, you will be given a list of actions which you can choose one to perform in this turn. You should choose from two actions: \"THOUGHT\" or \"ACTION\". If you choose \"THOUGHT\", you should first think about the current condition and plan for your future actions, and then output your action in this turn. Your output must strictly follow this format:\"THOUGHT: your thoughts.\n ACTION: your next action\n\"; If you choose \"ACTION\", you should directly output the action in this turn. Your output must strictly follow this format:\"ACTION: your next action\n\". After your each turn, the environment will give you immediate feedback based on which you plan your next few steps. if the environment output \"Nothing happened\", that means the previous action is invalid and you should try more options.\n Reminder: \n1. the action must be chosen from the given available actions. Any actions except provided available actions will be regarded as illegal. \n2. Think when necessary, try to act directly more in the process.\n\n"
            },
            {
                "role": "agent",
                "content": "OK. I'll follow your instructions and try my best to solve the task."
            },
            {
                "role": "user",
                "content": "Here is one example.\nYou are in the middle of a room. Looking quickly around you, you see a coffeetable 1, a diningtable 1, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a sidetable 2, a sidetable 1, and a sofa 1.\nYour task is to: put two cellphone in sofa.\n"
            },
            {
                "role": "agent",
                "content": "THOUGHT: The task is to put two cellphone in sofa, so i need to first find a cellphone and put it on sofa. I'll start my search on coffeetables.\n ACTION: go to coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "On the coffeetable 1, you see a box 1, a cellphone 3, and a plate 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 3 from coffeetable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 3 from the coffeetable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 3 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 3 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to diningtable 1\n"
            },
            {
                "role": "user",
                "content": "On the diningtable 1, you see a cellphone 2, a keychain 2, a laptop 1, a statue 2, and a statue 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: take cellphone 2 from diningtable 1\n"
            },
            {
                "role": "user",
                "content": "You pick up the cellphone 2 from the diningtable 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: go to sofa 1\n"
            },
            {
                "role": "user",
                "content": "On the sofa 1, you see a cellphone 3, a newspaper 2, a pillow 1, a remotecontrol 2, and a remotecontrol 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: put cellphone 2 in/on sofa 1\n"
            },
            {
                "role": "user",
                "content": "You put the cellphone 2 in/on the sofa 1.\n"
            },
            {
                "role": "agent",
                "content": "ACTION: Task succeeded.\n"
            },
            _**{
                "role": "user",
                "content": "Here is your task. You are in the middle of a room. Looking quickly around you, you see a armchair 1, a cabinet 4, a cabinet 3, a cabinet 2, a cabinet 1, a drawer 5, a drawer 4, a drawer 3, a drawer 2, a drawer 1, a dresser 1, a garbagecan 1, a safe 1, a shelf 12, a shelf 11, a shelf 10, a shelf 9, a shelf 8, a shelf 7, a shelf 6, a shelf 5, a shelf 4, a shelf 3, a shelf 2, a shelf 1, a sidetable 1, and a sofa 1.\nYour task is to: put two keychain in safe. "
            },
            {
                "role": "agent",
                "content": "ACTION: go to cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. "
            },
            {
                "role": "agent",
                "content": "ACTION: examine cabinet 1"
            },
            {
                "role": "user",
                "content": "The cabinet 1 is closed. AVAILABLE ACTIONS: examine cabinet 1\nexamine cabinet 2\ngo to armchair 1\ngo to cabinet 3\ngo to cabinet 4\ngo to drawer 1\ngo to drawer 2\ngo to drawer 3\ngo to drawer 4\ngo to drawer 5\ngo to dresser 1\ngo to garbagecan 1\ngo to safe 1\ngo to shelf 1\ngo to shelf 10\ngo to shelf 11\ngo to shelf 12\ngo to shelf 2\ngo to shelf 3\ngo to shelf 4\ngo to shelf 5\ngo to shelf 6\ngo to shelf 7\ngo to shelf 8\ngo to shelf 9\ngo to sidetable 1\ngo to sofa 1\ninventory\nlook\nopen cabinet 1\nopen cabinet 2\n"
            }
        ]**_
    貌似数据集是有问题的,多为什么最后task succeeded  后面还会有question出现?
Longin-Yu commented 9 months ago

这是 one-shot prompt,每条数据前面均有一条示例,succeed 之后的才是正式的问题,任务的具体定义详见论文。

Dhaizei commented 9 months ago

明白了,谢谢您~

Dhaizei commented 8 months ago

我在使用cg-std进行评估的时候出现下面的错误 Warning: agentlm-tuning/cg-std#19 failed with error START_FAILED {"detail":"Error: Worker not responding\n"} None

这是什么原因导致的呢?

ython -m src.start_task -a image

python -m src.assigner image

zhc7 commented 8 months ago

我们最新的版本应该修复了这个问题。如果只是偶尔出现的话可以忽略。

Dhaizei commented 8 months ago

我应该用的是最新的版本V2,并不是偶尔出现,目前只是在cg-std出现了(启动三次,出现三次),dbench,HH,OS都还没有出现过。

zhc7 commented 8 months ago

要不您尝试git pull一下main上最新的修改?我们在a069c7里尝试了修复这一问题。

Dhaizei commented 8 months ago

INTERACT_FAILED 一般是什么原因导致的呢? 交互失败,是不是,output无法识别出action 和thought?

zhc7 commented 8 months ago

一般来讲INTERACT_FAILED是在交互过程中发生了不可恢复的错误,这通常是意料之外的,例如网络中断,worker断连等。如果是output无法识别出action和thought一般task会有自己的处理方式,不会导致FAILED。

Dhaizei commented 8 months ago

谢谢您的回复,一般出现的问题都已经可以解决了。你们是否也对agentlm-13B、6B(huggingface上 上传的模型)进行测评了呢?我目前测了一下50个里面只对了5个,不知道我是什么地方配置错误了吗?同样条件下,gpt4可以跑到84%

zhc7 commented 8 months ago

Hi,AgentLM的测试数据应该可以在AgentLM的repo中找到,我们没有重复对其进行测试,也许您可以在他们的repo里提一个issue。

Dhaizei commented 7 months ago

我发现用70b模型进行推理十分缓慢,想着可以将需要测试的数据集分成4份,然后启动4个model-work来进行推理,这样可以增加推理的效率。期待可以将这个功能加上去

qinhy14 commented 7 months ago

问题找到了,一般如果我开比较大的下载,会导致,部分进程无法绑定ip地址,所以就没有启动成功,给下载数据的东西关掉就好了,主要还是网络拥堵的问题。。。。

您好,我也遇到了这个问题,但我没理解您这个是怎么解决的

qinhy14 commented 7 months ago

我这里发现如果测试的任务需要启动docker镜像就会出现找不到task的情况,例如 INFO: 127.0.0.1:57618 - "GET /api/get_indices?name=alfworld-dev HTTP/1.1" 400 Bad Request 我在start_task.py最后加了打印 worker的代码 resp = requests.get("http://localhost:5000/api/list_workers") print(resp.json(), flush=True) 如果是需要启动docker的镜像,这里就会是空的。 是不是我哪里需要配置什么样的参数呢? 请大佬们帮看一下 @zhc7 @Longin-Yu

Dhaizei commented 7 months ago

自动启动,需要先下载docker

qinhy14 commented 7 months ago

自动启动,需要先下载docker

docker的镜像是有拉取的。

docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE local-os/ubuntu latest f4ce9fbc8dcf 31 hours ago 77.8MB local-os/packages latest 9b433cd81448 31 hours ago 1.03GB local-os/default latest 56353027bb39 31 hours ago 532MB ubuntu latest b6548eacb063 6 days ago 77.8MB mysql latest a3b6608898d6 6 weeks ago 596MB longinyu/agentbench-alfworld latest 222e3b70c43a 2 months ago 12.4GB longinyu/agentbench-ltp latest 667e217f5296 2 months ago 11GB longinyu/agentbench-webshop latest 5cfedd5769de 4 months ago 26.7GB longinyu/agentbench-mind2web latest 1c1a79a384ab 4 months ago 17.8GB longinyu/agentbench-card_game latest 780632123dae 4 months ago 7.97GB