THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.11k stars 145 forks source link

[Bug/Assistance] card game 测评 开源大模型 运行报错 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"} #147

Open moon-fall opened 2 months ago

moon-fall commented 2 months ago

Describe the bug 每次运行到cg-std#14 assigner报错

Warning: Qwen2-72B-Instruct/cg-std#14 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"} index=None status=<SampleStatus.RUNNING: 'running'> result=None history=None

start_task进程中打印

except sending except message sent

然后任务运行完也会只有error.jsonl结果

看了下大概是这个位置开始报错 https://github.com/THUDM/AgentBench/blob/57b982b10f782661b1346b2234c5ed463f6f85c3/src/server/tasks/card_game/server.py#L38 打印出client_socket中接受到的data 发现json串不完整 如下

[{"role": "user", "content": "This is a two-player battle game with four pet fish on each team. The types of fish may vary.\nEach fish has its 400 initial health, 200 attack power, active ability, and passive ability.\nYou can choose a live fish to use its active skill or normal attack (causing half of attack power as damage) on an enemy fish each round.\nWhen the conditions are met, the fish's passive ability will automatically trigger, regardless of whether it is chosen.\nYour fish's identity is initially hidden. The enemy can guess one of your fish's identity in each round. If the enemy guesses right, your fish's identity is revealed, and each of your fish will get 50 damage.\nThe victory condition is to have more fish alive at the end of the game.\n\nThe following are the four types of your pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nThe following are the four types of enemy's pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nPlay the game with me. In each round, you should output your thinking process, and return your move with following JSON format:\n{'pick_fish': 'pick an alive fish, you should give the name of the alive fish', 'action': 'choose from [normal, active]', 'target_position': \"target's position, you must choose from [0,3]\"}\n\nNotice! You must return your move in each round. Otherwise, you will be considered defeated."}, {"role": "agent", "content": "Okay, I will play the game with you according to the rules."}, {"role": "user", "content": "\nFollowing is the current game state:\n{\"You\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"400\", \"ATK\": \"200\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"400\", \"ATK\": \"200\"}, {\"POSITION\": \"2\", \"ID\": \"eel\", \"HP\": \"400\", \"ATK\": \"200\"}, {\"POSITION\": \"3\", \"ID\": \"sunfish\", \"HP\": \"400\", \"ATK\": \"200\"}], \"Enemy\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"350\", \"ATK\": \"unknown\"}, {\"POSITION\": \"1\", \"ID\": \"unknown\", \"HP\": \"350\", \"ATK\": \"unknown\"}, {\"POSITION\": \"2\", \"ID\": \"unknown\", \"HP\": \"350\", \"ATK\": \"unknown\"}, {\"POSITION\": \"3\", \"ID\": \"unknown\", \"HP\": \"350\", \"ATK\": \"unknown\"}]}\nYour previous action: \n{'ACTION_FISH': 'None', 'ATK': 'None', 'TARGET': 'None'}\nEnemy's previous action:\n{'ACTION_FISH': 'None', 'ATK': 'None', 'TARGET': 'None'}\nPlease Output your next action.\n"}, {"role": "agent", "content": "Given the current game state, I notice that all of my fish are at full health and the enemy's fish have taken some damage, but their identities are unknown. I will choose to use the active ability of my'spray' fish to attack all enemies, dealing AOE damage. This will also trigger the 'Counter' passive ability of any enemy fish with a teammate's health below 30%, but since all of my fish are at full health, I won't be affected by the counter damage.\n\nMy move:\njson\n{\n \"pick_fish\": \"spray\",\n \"action\": \"active\",\n \"target_position\": \"all\"\n}\n\nHowever, since the target_position field requires a specific position and not \"all\", I will choose position 0 as a placeholder, knowing that the AOE attack will affect all enemies.\n\njson\n{\n \"pick_fish\": \"spray\",\n \"action\": \"active\",\n \"target_position\": 0\n}\n"}, {"role": "user", "content": "This is a two-player battle game with four pet fish on each team. The types of fish may vary.\nEach fish has its 400 initial health, 200 attack power, active ability, and passive ability.\nYou can choose a live fish to use its active skill or normal attack (causing half of attack power as damage) on an enemy fish each round.\nWhen the conditions are met, the fish's passive ability will automatically trigger, regardless of whether it is chosen.\nYour fish's identity is initially hidden. The enemy can guess one of your fish's identity in each round. If the enemy guesses right, your fish's identity is revealed, and each of your fish will get 50 damage.\nThe victory condition is to have more fish alive at the end of the game.\n\nThe following are the four types of your pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nThe following are the four types of enemy's pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nPlay the game with me. In each round, you should output your thinking process, and return your move with following JSON format:\n{'pick_fish': 'pick an alive fish, you should give the name of the alive fish', 'action': 'choose from [normal, active]', 'target_position': \"target's position, you must choose from [0,3]\"}\n\nNotice! You must return your move in each round. Otherwise, you will be considered defeated."}, {"role": "agent", "content": "Okay, I will play the game with you according to the rules."}, {"role": "user", "content": "\nFollowing is the current game state:\n{\"You\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"266\", \"ATK\": \"200\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"266\", \"ATK\": \"200\"}, {\"POSITION\": \"2\", \"ID\": \"eel\", \"HP\": \"294\", \"ATK\": \"200\"}, {\"POSITION\": \"3\", \"ID\": \"sunfish\", \"HP\": \"294\", \"ATK\": \"200\"}], \"Enemy\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"216\", \"ATK\": \"unknown\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"216\", \"ATK\": \"unknown\"}, {\"POSITION\": \"2\", \"ID\": \"unknown\", \"HP\": \"244\", \"ATK\": \"unknown\"}, {\"POSITION\": \"3\", \"ID\": \"unknown\", \"HP\": \"244\", \"ATK\": \"unknown\"}]}\nYour previous action: \n{'ACTION_FISH': '0', 'ATK': '70', 'TARGET': '[1, 2, 3]'}\nEnemy's previous action:\n{'ACTION_FISH': '0', 'ATK': '70', 'TARGET': '[1, 2, 3]'}\nPlease Output your next action.\n"}, {"role": "agent", "content": "Given the current game state, my'spray' fish has already revealed its identity and has taken some damage. The enemy's'spray' fish has also been revealed and has taken damage. My 'eel' and'sunfish' have not been revealed yet, and they have slightly more health than the rest of my team. The enemy has two unknown fish with more health than their revealed fish.\n\nMy strategy will be to target the enemy's revealed fish to reduce their health further and potentially trigger the 'Counter' passive ability of my 'flame' fish if one of my teammates' health drops below 30%. I will choose to use the 'active' ability of my 'flame' fish to attack one of the enemy's revealed fish, which will also increase its attack points by 140.\n\nMy move:\njson\n{\n \"pick_fish\": \"flame\",\n \"action\": \"active\",\n \"target_position\": 0\n}\n"}, {"role": "user", "content": "This is a two-player battle game with four pet fish on each team. The types of fish may vary.\nEach fish has its 400 initial health, 200 attack power, active ability, and passive ability.\nYou can choose a live fish to use its active skill or normal attack (causing half of attack power as damage) on an enemy fish each round.\nWhen the conditions are met, the fish's passive ability will automatically trigger, regardless of whether it is chosen.\nYour fish's identity is initially hidden. The enemy can guess one of your fish's identity in each round. If the enemy guesses right, your fish's identity is revealed, and each of your fish will get 50 damage.\nThe victory condition is to have more fish alive at the end of the game.\n\nThe following are the four types of your pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nThe following are the four types of enemy's pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nPlay the game with me. In each round, you should output your thinking process, and return your move with following JSON format:\n{'pick_fish': 'pick an alive fish, you should give the name of the alive fish', 'action': 'choose from [normal, active]', 'target_position': \"target's position, you must choose from [0,3]\"}\n\nNotice! You must return your move in each round. Otherwise, you will be considered defeated."}, {"role": "agent", "content": "Okay, I will play the game with you according to the rules."}, {"role": "user", "content": "\nFollowing is the current game state:\n{\"You\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"107\", \"ATK\": \"200\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"182\", \"ATK\": \"340\"}, {\"POSITION\": \"2\", \"ID\": \"eel\", \"HP\": \"238\", \"ATK\": \"200\"}, {\"POSITION\": \"3\", \"ID\": \"sunfish\", \"HP\": \"238\", \"ATK\": \"200\"}], \"Enemy\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"166\", \"ATK\": \"unknown\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"166\", \"ATK\": \"unknown\"}, {\"POSITION\": \"2\", \"ID\": \"eel\", \"HP\": \"194\", \"ATK\": \"unknown\"}, {\"POSITION\": \"3\", \"ID\": \"unknown\", \"HP\": \"194\", \"ATK\": \"unknown\"}]}\nYour previous action: \n{'ACTION_FISH': '1', 'ATK': '70', 'TARGET': '[1, 2, 3]'}\nEnemy's previous action:\n{'ACTION_FISH': '0', 'ATK': '70', 'TARGET': '[1, 2, 3]'}\nPlease Output your next action.\n"}, {"role": "agent", "content": "Given the current game state, my 'flame' fish has the highest attack power and has already revealed its identity. The enemy's'spray' and 'flame' fish have also been revealed, and they have more health than my'spray' fish. My 'eel' and'sunfish' have not been revealed yet, and they have more health than the rest of my team.\n\nMy strategy will be to target the enemy's revealed fish to reduce their health further and potentially trigger the 'Counter' passive ability of my 'flame' fish if one of my teammates' health drops below 30%. I will choose to use the 'active' ability of my 'flame' fish to attack one of the enemy's revealed fish, which will also increase its attack points by 140.\n\nMy move:\njson\n{\n \"pick_fish\": \"flame\",\n \"action\": \"active\",\n \"target_position\": 0\n}\n"}, {"role": "user", "content": "This is a two-player battle game with four pet fish on each team. The types of fish may vary.\nEach fish has its 400 initial health, 200 attack power, active ability, and passive ability.\nYou can choose a live fish to use its active skill or normal attack (causing half of attack power as damage) on an enemy fish each round.\nWhen the conditions are met, the fish's passive ability will automatically trigger, regardless of whether it is chosen.\nYour fish's identity is initially hidden. The enemy can guess one of your fish's identity in each round. If the enemy guesses right, your fish's identity is revealed, and each of your fish will get 50 damage.\nThe victory condition is to have more fish alive at the end of the game.\n\nThe following are the four types of your pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nThe following are the four types of enemy's pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nPlay the game with me. In each round, you should output your thinking process, and return your move with following JSON format:\n{'pick_fish': 'pick an alive fish, you should give the name of the alive fish', 'action': 'choose from [normal, active]', 'target_position': \"target's position, you must choose from [0,3]\"}\n\nNotice! You must return your move in each round. Otherwise, you will be considered defeated."}, {"role": "agent", "content": "Okay, I will play the game with you according to the rules."}, {"role": "user", "content": "This is a two-player battle game with four pet fish on each team. The types of fish may vary.\nEach fish has its 400 initial health, 200 attack power, active ability, and passive ability.\nYou can choose a live fish to use its active skill or normal attack (causing half of attack power as damage) on an enemy fish each round.\nWhen the conditions are met, the fish's passive ability will automatically trigger, regardless of whether it is chosen.\nYour fish's identity is initially hidden. The enemy can guess one of your fish's identity in each round. If the enemy guesses right, your fish's identity is revealed, and each of your fish will get 50 damage.\nThe victory condition is to have more fish alive at the end of the game.\n\nThe following are the four types of your pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nThe following are the four types of enemy's pet fish:\n{'spray': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': 'AOE: Attack all enemies for 35% of its attack points.'}, 'flame': {'passive': \"Counter: Deal 30 damage to attacker when a teammate's health is below 30%. \", 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}, 'eel': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': 'AOE: Attack all enemies for 35% of your attack points.'}, 'sunfish': {'passive': 'Deflect: Distribute 70% damage to teammates and takes 30% when attacked. Gains 40 attack points after taking 200 damage accumulated. ', 'active': \"Infight: Attack one alive teammate for 75 damage and increases your attack points by 140. Notice! You can't attack yourself or dead teamate! \"}}\n\nPlay the game with me. In each round, you should output your thinking process, and return your move with following JSON format:\n{'pick_fish': 'pick an alive fish, you should give the name of the alive fish', 'action': 'choose from [normal, active]', 'target_position': \"target's position, you must choose from [0,3]\"}\n\nNotice! You must return your move in each round. Otherwise, you will be considered defeated."}, {"role": "agent", "content": "Okay, I will play the game with you according to the rules."}, {"role": "user", "content": "\nFollowing is the current game state:\n{\"You\": [{\"POSITION\": \"0\", \"ID\": \"spray\", \"HP\": \"0\", \"ATK\": \"200\"}, {\"POSITION\": \"1\", \"ID\": \"flame\", \"HP\": \"182\", \"ATK\": \"480\"}, {\"POSITION\": \"2\", \"ID\": \"eel\", \"HP\": \"238\", \"ATK\": \"200\"}, {\"POSITION\": \"3\", \"ID\": \"sunfish\", \"HP\": \"238\", \"ATK\": \"200\"}], \"Enemy\": [{\"POSITION\": \"0\", \"ID\": \"

Screenshots or Terminal Copy&Paste 0608962dbf58e51e6470216f1cedb6f4

Desktop (please complete the following information):

用的qwen2 72b assign配置AgentBench\configs\assignments\default.yaml如下

import: definition.yaml

concurrency: task: kg-std: 1 agent: Qwen2-72B-Instruct: 1

assignments: # List[Assignment] | Assignment

output: "outputs/{TIMESTAMP}"