issues
search
THUDM
/
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
1.99k
stars
135
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
通过fastchat部署本地模型遇到的问题
#146
YinSonglin1997
opened
1 day ago
0
DBbench-std task with error "Can't connect to MySQL server"
#145
realbillbao
opened
1 week ago
1
urgent - if there one of the problems throws an error , why does the overall.json not show up??
#144
ishapuri
opened
2 weeks ago
0
Fix typo in os agent instruction
#143
rjmoss
opened
2 weeks ago
0
请问trajectories有公开吗
#142
yananchen1989
opened
3 weeks ago
0
[Feature] Add a LICENSE to the project
#141
cjoverbay
closed
4 weeks ago
2
Stupidd cupid patch 1
#140
StupiddCupid
closed
1 month ago
0
Zifei
#139
StupiddCupid
closed
1 month ago
3
Please check my problem description and corresponding check code
#137
StupiddCupid
closed
1 month ago
0
Would llama3 wizardlm2 and other latest models be tested and published in leaderboard? 请求添加llama3 wizardlm等24年4-5月大模型的测试结果
#136
dercaft
opened
1 month ago
1
[Feature] 请问每个任务的分是怎么计算的呢?比如OS任务中得到的只是一个准确率,但是在论文中Table3每个任务对应的都是分数,这中间的映射过程我在文中并没有找到,可以提示一下吗
#135
lonerFarea
opened
1 month ago
1
Fix typo in README.md
#134
petrgazarov
closed
1 month ago
0
请问如何使用本地的llama-2-hf模型进行测试呢,希望得到一些明确的指导![Bug/Assistance]
#133
5456es
closed
2 months ago
1
请问支持使用openai的tool_call接口进行测试吗?
#132
Maybewuss
opened
2 months ago
1
Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence.
#130
Konisberg
opened
3 months ago
1
[Bug/Assistance] mind2web的unknown是怎么回事?
#129
Tangent-90C
opened
3 months ago
1
OS std 测试集结果
#128
webdxq
opened
3 months ago
1
[Bug/Assistance] - Reproducing Results on Alfworld (HH) (vs. ReAct paper)
#127
ai-nikolai
opened
3 months ago
4
增加对Cluade3的评测
#126
webdxq
opened
4 months ago
2
format all files using black
#125
EYH0602
closed
4 months ago
0
Connection error
#124
StupiddCupid
closed
3 months ago
3
Fix Execution Permission Issue and Adjust LTP Task Rounds
#123
Taishi-N324
closed
4 months ago
2
Benchmark for mistral models
#122
mingxuan-he
opened
4 months ago
1
Card_Game这个任务跑不起来
#121
yupeijei1997
opened
4 months ago
4
修复因容器与宿主机控制器连接问题导致的“Task does not exist”
#120
Tangent-90C
closed
4 months ago
3
我该怎么解决这个问题,跑mind2web,不太清楚该如何操作这个任务,能给出一些具体的指导吗,谢谢
#119
Ethan-2004
opened
4 months ago
17
[Feature] Use for benchmarking agents like AutoGPT?
#118
shruti222patel
closed
4 months ago
1
Update README.md
#117
Longin-Yu
closed
4 months ago
0
[Bug/Assistance] kg-std任务运行的runs.jsonl文件中问题在数据集中找不到
#116
13416157913
closed
4 months ago
4
[Bug/Assistance] 测试kg-std任务时,输出文件中全部状态都是task limit reached
#115
13416157913
opened
5 months ago
1
[Bug/Assistance] 为什么dbbench任务,在mysql数据库中指创建一个unkown数据库名,而且里面只有一张表名称也是unkown,是不是初始化有问题?
#114
13416157913
closed
5 months ago
1
[Bug/Assistance] 测试os-std任务,提示Message: 0 samples remaining.
#113
13416157913
closed
5 months ago
6
[Bug/Assistance] OS任务报错AttributeError: 'NpipeSocket' object has no attribute '_sock'
#112
13416157913
closed
5 months ago
2
[Bug/Assistance] "result": {"answer": "1049 (42000): Unknown database 'Football Matches'", "type": "UPDATE", "error"
#111
13416157913
closed
5 months ago
1
ltp无法启动
#110
Fu-Dayuan
opened
5 months ago
1
[Bug/Assistance]
#109
ibingzhaoi
opened
5 months ago
5
dbbench-std: Task Output Seems Correct But MD5 Mismatches
#108
wchen-github
opened
5 months ago
1
agentbench 能跑训练集么?
#107
Fu-Dayuan
opened
5 months ago
1
[Bug/Assistance] DBBench Unknown database
#106
LittleWhite0208
opened
5 months ago
1
[Bug/Assistance] os-std某一条数据报错Worker not responding
#105
Xccanxin
opened
5 months ago
1
生成package镜像选择时区之后卡住了,请问这个是怎么回事,重新生成也不好使
#104
lidian1234
closed
5 months ago
0
[Assistance] Need some example running logs
#103
ROCKYWWWW
opened
5 months ago
2
[Bug/Assistance] 怎么配置configs/agents/openai-chat.yaml
#102
yananchen1989
closed
6 months ago
1
请问一下为什么output文件夹里没有overall.json?
#101
tml2002
closed
6 months ago
0
请问一下为什么output文件夹里没有overall.json?
#100
tml2002
closed
6 months ago
0
[Bug/Assistance]
#99
tml2002
closed
6 months ago
0
[Bug/Assistance]
#98
tml2002
closed
6 months ago
0
cg和kg都遇到了Worker not responding
#97
WarBean
opened
6 months ago
1
游戏任务启动失败[Assistance]
#96
smartliuhw
opened
6 months ago
3
Update Config_en.md
#95
ZiyueWang25
closed
4 months ago
0
Next