THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k stars 150 forks source link

[Feature] 请问每个任务的分是怎么计算的呢?比如OS任务中得到的只是一个准确率,但是在论文中Table3每个任务对应的都是分数,这中间的映射过程我在文中并没有找到,可以提示一下吗 #135

Open lonerFarea opened 4 months ago

lonerFarea commented 4 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like A clear and concise description of what you want to happen.

Additional context Add any other context or screenshots about the feature request here.

zhc7 commented 4 months ago

hi, @lonerFarea 分数的计算方式就是根据Table2的权重进行加权平均