THUDM AgentBench issues

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

https://llmbench.ai

Apache License 2.0

2.15k stars 150 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Bug/Assistance] 一部分执行成功，一部分执行失败的怎么处理

#65 wangyanli3630 closed 9 months ago
1
[Feature] 是否有方法可以使用多个chatgpt的api_key，这样可以减少访问限制的情况发生

#64 wangyanli3630 closed 9 months ago
1
cg和ltp的std都出现了问题：Error: Worker not responding

#63 wangyanli3630 opened 11 months ago
7
测评方式询问，只看每一步对不对吗？[Bug/Assistance]

#62 Wenze7 closed 9 months ago
4
There are some error questions in data/knowledgegraph/std.json

#61 kev123456 closed 11 months ago
2
init commit avalon

#60 HenryCai11 closed 11 months ago
0
Revert "init commit avalon"

#59 Xiao9905 closed 11 months ago
0
init commit avalon

#58 HenryCai11 closed 11 months ago
0
[Bug/Assistance] openai-text.yaml endpoint

#57 nlpcat closed 11 months ago
1
[Bug/Assistance] webshop failed

#56 nlpcat closed 11 months ago
14
[Bug/Assistance] document on v0.2

#55 nlpcat closed 11 months ago
4
OS-Interaction output of os env has many Escape Sequence, Not suitable for human reading

#54 simonjoe246 closed 11 months ago
3
无法正常启动，访问task会报错

#53 Dhaizei closed 9 months ago
54
缺少local_agent.yaml文件

#52 Dhaizei closed 11 months ago
1
Running in Colab

#51 olivarb closed 11 months ago
8
Update Config_cn.md

#50 XueyangFeng closed 11 months ago
0
webshop stuck at 78/80

#49 harshraj172 closed 11 months ago
2
How do you deal with the cases when the input is longer than the context length?

#48 leoozy closed 11 months ago
2
[Feature Request] Add more difficult data in the DB task, such as Spider1.0

#47 iamlockelightning closed 11 months ago
1
webshop gets all-zero results

#46 lzwqjh closed 11 months ago
4
DBBench failed

#45 nlpcat closed 11 months ago
2
Custom task or test set

#44 mahmoudialireza closed 11 months ago
12
Stuck when running webshop evaluation

#43 lzwqjh closed 1 year ago
0
webshop task : JVM exception occured

#42 Z-ZHHH closed 1 year ago
1
how to run the webshop task

#41 Z-ZHHH closed 1 year ago
6
我在按照turorial时遇到的问题

#40 ChangFeng2015 closed 11 months ago
5
Access to Test Sets

#39 guosyjlu closed 11 months ago
4
Play AlfWorld with GPT-3.5-turbo

#38 Hua-rookie closed 11 months ago
1
docker preparation: webshop

#37 Hua-rookie closed 1 year ago
4
Traces of different evaluations

#36 Andrewzh112 closed 11 months ago
4
怎样部署才可以达到demo里展示的同ubuntu进行交互

#35 ChangFeng2015 closed 11 months ago
1
Errors in dev data of OS-Interaction

#34 zwhe99 closed 11 months ago
8
Request to add scores of LLaMA-2-70B-Chat

#33 linkmancheng closed 11 months ago
1
Stuck when running webshop evaluation

#32 zwhe99 closed 1 year ago
1
AttributeError: module 'src.tasks' has no attribute 'DBBench'

#31 zwhe99 closed 1 year ago
5
python: can't open file 'evaluate.py': [Errno 2] No such file or directory

#30 Elissa0723 closed 1 year ago
1
The evaluation of knowledge graph always get zero

#29 cyente closed 11 months ago
4
Enhancement Request: Improve 3-shot Examples in mind2web Dataset

#28 lr-tsinghua11 closed 11 months ago
0
Mind2web issue

#27 Anticope12 closed 11 months ago
3
JSONDecodeError

#26 harshraj172 closed 11 months ago
3
缺少相关模块

#25 liang880912 closed 11 months ago
2
webshop task : JVM exception occured

#24 harshraj172 closed 1 year ago
8
OS任务镜像构建失败

#23 MrPig closed 1 year ago
1
什么时候评测一下百度文心模型？

#22 vaxilicaihouxian closed 11 months ago
1
Request to update scores of claude models

#21 wooparadog closed 11 months ago
5
What temperature and max_new_tokens should be used?

#20 tju01 closed 1 year ago
3
CardGame task always runing

#19 cicyby closed 11 months ago
4
How to interpret the assessment results

#18 foamliu closed 1 year ago
1
KeyError: <class 'src.configs.YAMLConfig'> in lateralthinkingpuzzle

#16 harshraj172 closed 1 year ago
3
Discussion: Next Version Requirements and Improvements

#15 Longin-Yu closed 11 months ago
4

Previous Next