issues
search
THUDM
/
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.23k
stars
159
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Assistance] OS task retrun infos
#173
xiaxiaxiatengxi
opened
4 days ago
0
The initial prototype of the FHIRTask class inspired from Extend AgentBench Section.
#172
dannyslowpark
opened
4 days ago
0
pull request
#171
genglongling
opened
1 week ago
1
[Bug/Assistance] '{"detail":"Error: Task does not exist"}', 400, 'webshop-std'
#170
AlphaLee1113
opened
1 month ago
0
[Data availability] Model trajectories
#169
felipemaiapolo
opened
1 month ago
0
[Assistance] 如何实现demo视频中的效果
#168
XGJ111
opened
1 month ago
0
webshop场景,为什么有些搜索没有结果,导致任务失败
#167
kai0705
opened
2 months ago
0
[Feature] 关于游戏场景docker的一些疑问,http://nginx.org/r/error_log,相关报错,请问这个是docker没有连接外网导致的吗
#166
kai0705
opened
2 months ago
0
Update README.md
#165
Xiao9905
closed
3 months ago
0
OS-task catch errors in container init
#164
rjmoss
closed
1 week ago
0
Fixed hanging bash commands from agent in os-task
#163
rjmoss
closed
1 week ago
0
Fixed terminal output parsing
#162
rjmoss
closed
1 week ago
0
where can we find the api of sparql?
#161
cssx1234
closed
3 months ago
0
kg的服务我部署好了,但是还是不能够正常测评kg任务,具体错误如下
#160
minleminzui
closed
3 months ago
1
[Bug/Assistance] kg-std issues
#159
night-chen
closed
3 months ago
1
Update README.md for local deployment of KG service
#158
finger1517
closed
3 months ago
0
Update README.md for local deployment of KG service
#157
finger1517
closed
3 months ago
0
Add Quitting for OS task
#156
dillonmsandhu
closed
3 months ago
0
Any plans to add new models?
#155
ryoungj
opened
3 months ago
1
[Bug/Assistance]
#154
matinaghaei
opened
3 months ago
0
[Bug/Assistance] kg的这个任务,http://164.107.116.56:3093/sparql这个服务器地址,似乎宕机了,执行python src/server/tasks/knowledgegraph/utils/sparql_executer.py会超时
#153
minleminzui
closed
3 months ago
3
Could you please upload the dockerfile?
#152
HCHCXY
opened
4 months ago
3
[Bug/Assistance] A lot of os-std tasks are impossible
#151
rjmoss
opened
4 months ago
0
[Bug/Assistance] how to use local model to replace gpt3.5?
#150
lambda7xx
opened
4 months ago
2
fix: fix AgentBench/data/os_interaction/data/4/ N11.json
#149
minleminzui
opened
4 months ago
0
[Feature] 请问你们kg的最终得分是哪个数据呀,我看你们的指标有三个F1,Exact Match和Executability,还是他们加权呀,我并没有看到加权公式
#148
minleminzui
closed
4 months ago
2
[Bug/Assistance] card game 测评 开源大模型 运行报错 failed with error INTERACT_FAILED {"detail":"Error: Worker not responding\n"}
#147
moon-fall
opened
4 months ago
0
通过fastchat部署本地模型遇到的问题
#146
YinSonglin1997
opened
4 months ago
12
DBbench-std task with error "Can't connect to MySQL server"
#145
realbillbao
opened
4 months ago
2
urgent - if there one of the problems throws an error , why does the overall.json not show up??
#144
ishapuri
opened
5 months ago
0
Fix typo in os agent instruction
#143
rjmoss
closed
4 months ago
0
请问trajectories有公开吗
#142
yananchen1989
opened
5 months ago
0
[Feature] Add a LICENSE to the project
#141
cjoverbay
closed
5 months ago
2
Stupidd cupid patch 1
#140
StupiddCupid
closed
5 months ago
0
Zifei
#139
StupiddCupid
closed
5 months ago
3
Please check my problem description and corresponding check code
#137
StupiddCupid
closed
6 months ago
0
Would llama3 wizardlm2 and other latest models be tested and published in leaderboard? 请求添加llama3 wizardlm等24年4-5月大模型的测试结果
#136
dercaft
opened
6 months ago
3
[Feature] 请问每个任务的分是怎么计算的呢?比如OS任务中得到的只是一个准确率,但是在论文中Table3每个任务对应的都是分数,这中间的映射过程我在文中并没有找到,可以提示一下吗
#135
lonerFarea
opened
6 months ago
1
Fix typo in README.md
#134
petrgazarov
closed
5 months ago
0
请问如何使用本地的llama-2-hf模型进行测试呢,希望得到一些明确的指导![Bug/Assistance]
#133
5456es
closed
6 months ago
1
请问支持使用openai的tool_call接口进行测试吗?
#132
Maybewuss
opened
7 months ago
1
Excellent Job! Well, no offense, it seems LLM-Bench rather than AgentBench in essence.
#130
Konisberg
opened
8 months ago
1
[Bug/Assistance] mind2web的unknown是怎么回事?
#129
Tangent-90C
opened
8 months ago
1
OS std 测试集结果
#128
xqun3
opened
8 months ago
1
[Bug/Assistance] - Reproducing Results on Alfworld (HH) (vs. ReAct paper)
#127
ai-nikolai
opened
8 months ago
4
增加对Cluade3的评测
#126
xqun3
opened
8 months ago
2
format all files using black
#125
EYH0602
closed
8 months ago
0
Connection error
#124
StupiddCupid
closed
8 months ago
3
Fix Execution Permission Issue and Adjust LTP Task Rounds
#123
Taishi-N324
closed
8 months ago
2
Benchmark for mistral models
#122
mingxuan-he
opened
8 months ago
1
Next