THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.03k stars 138 forks source link

[Feature Request] Add more difficult data in the DB task, such as Spider1.0 #47

Closed iamlockelightning closed 8 months ago

iamlockelightning commented 10 months ago

Thank you so much for publishing such an elegant framework for evaluating LLM Agents.

Would you consider adding more difficult data in the DB task? I see there are only single-table querying SQLs in the task, which is easy to solve and has some gap between real-world cases.

There are many other quality data such as Spider 1.0 that contain complex queries (multiple tables joining, etc,.).

Hope to see more complex SQL data in this task. 👍

Xiao9905 commented 8 months ago

@iamlockelightning Thanks for your interest and great suggestion to AgentBench! We'd love to add more challenging samples to the DB task in the future work plan. Currently, we also encourage you to contribute your effort to the next version of AgentBench if you would love to. Please feel free to open a PR to add data/task/environment!