issues
search
THUDM
/
AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
https://llmbench.ai
Apache License 2.0
2.15k
stars
150
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Bug/Assistance] 一部分执行成功,一部分执行失败的怎么处理
#65
wangyanli3630
closed
9 months ago
1
[Feature] 是否有方法可以使用多个chatgpt的api_key,这样可以减少访问限制的情况发生
#64
wangyanli3630
closed
9 months ago
1
cg和ltp的std都出现了问题:Error: Worker not responding
#63
wangyanli3630
opened
11 months ago
7
测评方式询问,只看每一步对不对吗?[Bug/Assistance]
#62
Wenze7
closed
9 months ago
4
There are some error questions in data/knowledgegraph/std.json
#61
kev123456
closed
11 months ago
2
init commit avalon
#60
HenryCai11
closed
11 months ago
0
Revert "init commit avalon"
#59
Xiao9905
closed
11 months ago
0
init commit avalon
#58
HenryCai11
closed
11 months ago
0
[Bug/Assistance] openai-text.yaml endpoint
#57
nlpcat
closed
11 months ago
1
[Bug/Assistance] webshop failed
#56
nlpcat
closed
11 months ago
14
[Bug/Assistance] document on v0.2
#55
nlpcat
closed
11 months ago
4
OS-Interaction output of os env has many Escape Sequence, Not suitable for human reading
#54
simonjoe246
closed
11 months ago
3
无法正常启动,访问task会报错
#53
Dhaizei
closed
9 months ago
54
缺少local_agent.yaml文件
#52
Dhaizei
closed
11 months ago
1
Running in Colab
#51
olivarb
closed
11 months ago
8
Update Config_cn.md
#50
XueyangFeng
closed
11 months ago
0
webshop stuck at 78/80
#49
harshraj172
closed
11 months ago
2
How do you deal with the cases when the input is longer than the context length?
#48
leoozy
closed
11 months ago
2
[Feature Request] Add more difficult data in the DB task, such as Spider1.0
#47
iamlockelightning
closed
11 months ago
1
webshop gets all-zero results
#46
lzwqjh
closed
11 months ago
4
DBBench failed
#45
nlpcat
closed
11 months ago
2
Custom task or test set
#44
mahmoudialireza
closed
11 months ago
12
Stuck when running webshop evaluation
#43
lzwqjh
closed
1 year ago
0
webshop task : JVM exception occured
#42
Z-ZHHH
closed
1 year ago
1
how to run the webshop task
#41
Z-ZHHH
closed
1 year ago
6
我在按照turorial时遇到的问题
#40
ChangFeng2015
closed
11 months ago
5
Access to Test Sets
#39
guosyjlu
closed
11 months ago
4
Play AlfWorld with GPT-3.5-turbo
#38
Hua-rookie
closed
11 months ago
1
docker preparation: webshop
#37
Hua-rookie
closed
1 year ago
4
Traces of different evaluations
#36
Andrewzh112
closed
11 months ago
4
怎样部署才可以达到demo里展示的同ubuntu进行交互
#35
ChangFeng2015
closed
11 months ago
1
Errors in dev data of OS-Interaction
#34
zwhe99
closed
11 months ago
8
Request to add scores of LLaMA-2-70B-Chat
#33
linkmancheng
closed
11 months ago
1
Stuck when running webshop evaluation
#32
zwhe99
closed
1 year ago
1
AttributeError: module 'src.tasks' has no attribute 'DBBench'
#31
zwhe99
closed
1 year ago
5
python: can't open file 'evaluate.py': [Errno 2] No such file or directory
#30
Elissa0723
closed
1 year ago
1
The evaluation of knowledge graph always get zero
#29
cyente
closed
11 months ago
4
Enhancement Request: Improve 3-shot Examples in mind2web Dataset
#28
lr-tsinghua11
closed
11 months ago
0
Mind2web issue
#27
Anticope12
closed
11 months ago
3
JSONDecodeError
#26
harshraj172
closed
11 months ago
3
缺少相关模块
#25
liang880912
closed
11 months ago
2
webshop task : JVM exception occured
#24
harshraj172
closed
1 year ago
8
OS任务镜像构建失败
#23
MrPig
closed
1 year ago
1
什么时候评测一下百度文心模型?
#22
vaxilicaihouxian
closed
11 months ago
1
Request to update scores of claude models
#21
wooparadog
closed
11 months ago
5
What temperature and max_new_tokens should be used?
#20
tju01
closed
1 year ago
3
CardGame task always runing
#19
cicyby
closed
11 months ago
4
How to interpret the assessment results
#18
foamliu
closed
1 year ago
1
KeyError: <class 'src.configs.YAMLConfig'> in lateralthinkingpuzzle
#16
harshraj172
closed
1 year ago
3
Discussion: Next Version Requirements and Improvements
#15
Longin-Yu
closed
11 months ago
4
Previous
Next