[Feature] Use for benchmarking agents like AutoGPT?

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

https://llmbench.ai

Apache License 2.0

2.01k stars 136 forks source link

[Feature] Use for benchmarking agents like AutoGPT? #118

Closed shruti222patel closed 4 months ago

shruti222patel commented 4 months ago

Hi,

Can this repo be used for testing and benchmarking general agent frameworks like AutoGPT? Are there docs around this or could you point me to the code files I would need to update to make this work?

Thanks!

zhc7 commented 4 months ago

Hi, @shruti222patel . Yes, it is possible. Writing a new client for each agent framework should be sufficient. You may refer to https://github.com/THUDM/AgentBench/blob/main/docs/Introduction_en.md#23-client. If this does not meet your requirement, you might have to change the core interaction logic in src/assigner.py.