THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k stars 150 forks source link

Play AlfWorld with GPT-3.5-turbo #38

Closed Hua-rookie closed 11 months ago

Hua-rookie commented 1 year ago

I tried to play alfworld in the docker provided by AgentBench, and used the following command for playing:

export GPT_TURBO_SERVER_URL="http://40.74.217.35:10012/api/openai/chat-completion"
export GPT_TURBO_SERVER_AUTHORIZATION="7606d41c54e4236ff492ef8445e42cde"
python evaluate.py --task configs/tasks/<your_task>.yaml --agent configs/agents/local/turbo.yaml --workers 20

however, I got the game all failed with "output": {"log": [{"round": 1, "output": "", "action": "", "observation": "Nothing happens.", "done": false} in every round.

I wonder why it happened and how can I solve it?

Longin-Yu commented 11 months ago

Thanks for your instrest. Try our new version and it's more simple to start.