THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.23k stars 159 forks source link

[Bug/Assistance] #154

Open matinaghaei opened 3 months ago

matinaghaei commented 3 months ago

This issue is related to a previous one.

For the Knowledge Graph task, the agent seems to be providing the correct answer but the feedback insists that the answer is wrong. Consequently, I'm getting a zero score even using GPT-4.

Here is part of the output from the run.jsonl file for the KG task:

{"role": "user", "content": "A new question: Which suprbowls were won by an NFL team that played in the New York Metropolitan area?\nEntities: [suprbowls, New York metropolitan area]"}, {"role": "agent", "content": "Thought: I need to first find NFL teams that played in the New York Metropolitan area. To achieve this, I will query the KB to find relations connected to the entity 'New York metropolitan area'.\nAction: get_relations(New York metropolitan area)"}, {"role": "user", "content": "get_relations(New York metropolitan area) cannot be executed. You may make a mistake and need to fix it."}

I'm also having a similar issue with the LTP task (I got zero score using GPT-4). I'm providing one line from the run.jsonl file for the LTP task down below:

{"index": 48, "error": null, "info": null, "output": {"index": 48, "status": "task limit reached", "result": {"accuracy": 0.32, "efficiency": 0.0, "relevance": 1.0, "progress": 0.0, "finish_round": 25, "finished": false, "hit_keys": {}, "history": ["Story: The classmates celebrated Xiao Ming's birthday, but under the watchful eyes of everyone, Xiao Ming was stabbed to death by a sharp weapon, please reason.", "Truth: Some cakes need to be shaped, and they will be fixed with bamboo sticks or similar sharp objects. In order to surprise Xiao Ming, the classmate who ordered the cake ordered a cake with a strange shape, which contained a weapon. However, some students didn't know that after singing the birthday song to Xiao Ming, they put Xiao Ming's head into the cake and pierced his head with a sharp weapon.", "Round 1: Was the stabbing of Xiao Ming intentional?", "No.", "Round 2: Was the stabbing an accident?", "Yes.", "Round 3: Was the sharp weapon something that is typically used for cutting a birthday cake?", "No.", "Round 4: Was the sharp weapon an object that is commonly found at a birthday party?", "No.", "Round 5: Was the sharp weapon brought to the party by someone?", "Yes.", "Round 6: Was the person who brought the sharp weapon aware that it was dangerous?", "No.", "Round 7: Was the sharp weapon disguised as something else?", "Yes.", "Round 8: Was the sharp weapon disguised as a gift?", "No.", "Round 9: Was the sharp weapon disguised as a decoration?", "Yes.", "Round 10: Did the sharp weapon accidentally stab Xiao Ming during the process of setting up or handling the decoration?", "No.", "Round 11: Did the sharp weapon accidentally stab Xiao Ming during the celebration activities?", "Yes.", "Round 12: Was the sharp weapon part of a pi\u00f1ata or similar party game?", "No.", "Round 13: Was the sharp weapon part of a costume or prop used during the celebration?", "No.", "Round 14: Was the sharp weapon part of a balloon or something related to balloons?", "No.", "Round 15: Was the sharp weapon part of a party favor or something handed out to guests?", "No.", "Round 16: Was the sharp weapon part of a centerpiece or table decoration?", "No.", "Round 17: Was the sharp weapon part of a wall or ceiling decoration?", "No.", "Round 18: Was the sharp weapon part of a game or activity that involved movement or physical interaction?", "No.", "Round 19: Was the sharp weapon part of a food or drink item?", "Yes.", "Round 20: Was the sharp weapon hidden inside a food item?", "Yes.", "Round 21: Was the sharp weapon hidden inside a cake?", "Yes.", "Round 22: Did Xiao Ming accidentally stab himself while cutting or eating the cake?", "No.", "Round 23: Did someone else accidentally stab Xiao Ming while cutting or serving the cake?", "No.", "Round 24: Did Xiao Ming accidentally stab himself while eating the cake?", "No.", "Round 25: Did Xiao Ming accidentally get stabbed while someone else was eating the cake?", "No."]}, "history": [{"role": "user", "content": "You are a game player, and you are playing Lateral Thinking Puzzle, also known as Situation Puzzle.\nLateral Thinking Puzzle is a deductive reasoning game, and here are the game rules:\n1. At the beginning of the game, you will receive a narrative, referred to as \"story\". Based on the story, you need to ask questions that can be answered with \"yes\", \"no\", or \"irrelevant\" to guess out the \"truth\".\n2. By asking questions, you narrow down the range of possibilities until you eventually guess out the truth.\n3. Each time, you can only ask one question.\n4. Remember that your role is a player. You cannot declare the end of the game, give up on reasoning, or request a new game.\n5. You cannot directly repeat information already provided in the story.\n6. You cannot directly ask for details about the story in the form of \"why\" questions; you need to make your own guesses for truth.\n7. You cannot directly inquire about the story; you must make your own deductions.\n\nNext, please make full use of the information provided above to engage in game reasoning. Keep in mind that your questions should be answerable with \"yes\", \"no\", or \"irrelevant\", and you can only ask one question at a time.\nHere is your story:\nThe classmates celebrated Xiao Ming's birthday, but under the watchful eyes of everyone, Xiao Ming was stabbed to death by a sharp weapon, please reason.\n\nYou can start guessing the content of the truth, and I will answer your questions. Please note that your questions should be answerable with \"yes\", \"no\", or \"irrelevant\"."}, {"role": "agent", "content": "Was the stabbing of Xiao Ming intentional?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the stabbing an accident?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the sharp weapon something that is typically used for cutting a birthday cake?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon an object that is commonly found at a birthday party?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon brought to the party by someone?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the person who brought the sharp weapon aware that it was dangerous?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon disguised as something else?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the sharp weapon disguised as a gift?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon disguised as a decoration?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Did the sharp weapon accidentally stab Xiao Ming during the process of setting up or handling the decoration?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Did the sharp weapon accidentally stab Xiao Ming during the celebration activities?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the sharp weapon part of a pi\u00f1ata or similar party game?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a costume or prop used during the celebration?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a balloon or something related to balloons?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a party favor or something handed out to guests?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a centerpiece or table decoration?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a wall or ceiling decoration?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a game or activity that involved movement or physical interaction?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Was the sharp weapon part of a food or drink item?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the sharp weapon hidden inside a food item?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Was the sharp weapon hidden inside a cake?"}, {"role": "user", "content": "Yes."}, {"role": "agent", "content": "Did Xiao Ming accidentally stab himself while cutting or eating the cake?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Did someone else accidentally stab Xiao Ming while cutting or serving the cake?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Did Xiao Ming accidentally stab himself while eating the cake?"}, {"role": "user", "content": "No."}, {"role": "agent", "content": "Did Xiao Ming accidentally get stabbed while someone else was eating the cake?"}]}, "time": {"timestamp": 1721413168803, "str": "2024-07-19 14:19:28"}}