THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

OS-Interaction output of os env has many Escape Sequence, Not suitable for human reading #54

Closed simonjoe246 closed 9 months ago

simonjoe246 commented 9 months ago

Every output of os env occurs,I am not sure whether it is normal or not, and if it affect model generation.

{
    "history": [
        {
            "role": "user",
            "content": "Now, I will start a new problem in a new OS. My problem is:\n\nFind out count of users."
        },
        {
            "role": "agent",
            "content": "Think: To find the count of users, I need to run the `cat` command on the `/etc/passwd` file.\n\nAct: bash\n\n```bash\ncat /etc/passwd | wc -l\n```"
        },
        {
            "role": "user",
            "content": "The output of the OS:\n\n21\r\n\u001b[?2004h\u001b]0;root@59d75ad36c35: /\u0007root@59d75ad36c35:/# "
        },
        {
            "role": "agent",
            "content": "Think: The output has been truncated because it is too long, so I need to count the lines in the output.\n\nAct: bash\n\n```bash\nwc -l /etc/passwd | awk '{s+=$1} END {print s}'\n```"
        },
        {
            "role": "user",
            "content": "The output of the OS:\n\n21\r\n\u001b[?2004h\u001b]0;root@59d75ad36c35: /\u0007root@59d75ad36c35:/# \u001b[K"
        },
        {
            "role": "agent",
            "content": "Think: Now I get the answer, it is 21.\n\nAct: answer(21)"
        }
    ],
    "exception_raised": false
}
Longin-Yu commented 9 months ago

That's normal. We found that GPT4 can solve the problems normally. It may be worthy considering the method of tokenization for these escape characters.

Longin-Yu commented 9 months ago

Feel free to reopen if any problem still exists.