THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.01k stars 136 forks source link

[Assistance] Need some example running logs #103

Open ROCKYWWWW opened 6 months ago

ROCKYWWWW commented 6 months ago

Could u release some running logs, like conversation history to us as a reference? These logs can be used as a reference to check the gap between the scores of some tasks and the results reported on the leaderboard.

zhc7 commented 5 months ago

Hi, @ROCKYWWWW Do you have any specific need? Collating and publishing all running logs of all models we evaluated requires a huge amount of work.

ROCKYWWWW commented 5 months ago

Hi, @ROCKYWWWW Do you have any specific need? Collating and publishing all running logs of all models we evaluated requires a huge amount of work.

If possible, please release the running logs of 8 tasks of gpt4, which will be very helpful.