issues
search
THUDM
/
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.01k
stars
136
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
游戏任务启动失败[Assistance]
#96
smartliuhw
opened
6 months ago
3
Update Config_en.md
#95
ZiyueWang25
closed
4 months ago
0
Update README.md
#94
ZiyueWang25
closed
6 months ago
0
可否不用docker配置环境
#93
smartliuhw
closed
6 months ago
2
我想看一下agent和server的交互函数,可以指导一下嘛
#92
hushuang909
closed
5 months ago
2
About Webshop
#91
dapengchen1234
closed
5 months ago
1
Fix typo: AgentClient.reference --> AgentClient.inference
#90
BarryRun
closed
6 months ago
0
[Bug/Assistance] DBbench任务评测结果与leaderboard不一致
#89
SummerXIATIAN
opened
6 months ago
1
KBQA 任务数据集信息确认
#88
WuXuan374
closed
6 months ago
0
cg任务没有一条执行成功而且task server没有收到任何信息
#87
Jianzhao-Huang
opened
6 months ago
1
[Assistance] Connection Error
#86
wz1211
closed
6 months ago
1
[Bug/Assistance] The option link fails to jump
#85
zhimin-z
opened
7 months ago
0
Error with Command “python -m src.start_task -a”
#84
ericzdzhang
closed
5 months ago
5
How to test in self customed data?
#83
Reason-Wang
closed
6 months ago
1
您好,想问下测试中所有的大模型都是如{role:user/assistant,content:},这种格式发送的么
#82
pfx546746447
closed
6 months ago
3
Separate server for task and model
#81
Reason-Wang
closed
7 months ago
2
[Assistance] 如何获得每个task的得分?
#80
Jiaqi0109
closed
7 months ago
1
How to calculate the overall score?
#79
zhimin-z
closed
7 months ago
1
运行AgentBench报错
#78
QingChengLineOne
closed
7 months ago
2
我想将api接口改为ChatGLM3,我该怎么做
#77
QingChengLineOne
closed
7 months ago
5
[Assistance] How to change the prompt in the task
#76
Z-ZHHH
closed
7 months ago
2
How can I use other LLM, such as LLAMA2?
#75
wangyf456
closed
7 months ago
4
[Bug/Assistance] The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
#74
wangyf456
closed
7 months ago
1
关于('{"detail":"Error: Task does not exist"}', 400, 'alfworld-std')问题
#73
XiaoShihua
closed
7 months ago
8
[Assistance] Number of problems in the OS dataset
#72
deema-A
opened
8 months ago
2
[Feature] Integrate with LiteLLM - Evaluate 100+LLMs, 92% faster
#71
ishaan-jaff
closed
6 months ago
1
INTERACT_FAILED Error: Session does not exist
#70
glad4enkonm
closed
8 months ago
3
Evaluation results is always 0, and different from the Leaderboard
#69
lynneChan
opened
8 months ago
4
Can not run webshop task correctly
#68
lynneChan
closed
8 months ago
4
[Bug/Assistance] document typos
#67
bwin90
closed
8 months ago
1
minimum r in “Evaluation Prompt Setup”?
#66
DryPilgrim
opened
8 months ago
1
[Bug/Assistance] 一部分执行成功,一部分执行失败的怎么处理
#65
wangyanli3630
closed
6 months ago
1
[Feature] 是否有方法可以使用多个chatgpt的api_key,这样可以减少访问限制的情况发生
#64
wangyanli3630
closed
6 months ago
1
cg和ltp的std都出现了问题:Error: Worker not responding
#63
wangyanli3630
opened
8 months ago
7
测评方式询问,只看每一步对不对吗?[Bug/Assistance]
#62
Wenze7
closed
6 months ago
4
There are some error questions in data/knowledgegraph/std.json
#61
kev123456
closed
8 months ago
2
init commit avalon
#60
HenryCai11
closed
8 months ago
0
Revert "init commit avalon"
#59
Xiao9905
closed
8 months ago
0
init commit avalon
#58
HenryCai11
closed
8 months ago
0
[Bug/Assistance] openai-text.yaml endpoint
#57
nlpcat
closed
8 months ago
1
[Bug/Assistance] webshop failed
#56
nlpcat
closed
8 months ago
14
[Bug/Assistance] document on v0.2
#55
nlpcat
closed
8 months ago
4
OS-Interaction output of os env has many Escape Sequence, Not suitable for human reading
#54
simonjoe246
closed
8 months ago
2
无法正常启动,访问task会报错
#53
Dhaizei
closed
6 months ago
54
缺少local_agent.yaml文件
#52
Dhaizei
closed
8 months ago
1
Running in Colab
#51
olivarb
closed
8 months ago
8
Update Config_cn.md
#50
XueyangFeng
closed
8 months ago
0
webshop stuck at 78/80
#49
harshraj172
closed
8 months ago
2
How do you deal with the cases when the input is longer than the context length?
#48
leoozy
closed
8 months ago
2
[Feature Request] Add more difficult data in the DB task, such as Spider1.0
#47
iamlockelightning
closed
8 months ago
1
Previous
Next