Closed mahmoudialireza closed 1 year ago
Yes. We are working on a newer version of the framework for easier deployment and extension.
Thanks. For now how can I do that? IS there any documentation or instruction for that?
@mahmoudialireza Hi, thanks for your interest in AgentBench. Our new version v0.2 has been updated to the repo and please take a look at our readme to find the new documentation. Feel free to reopen this issue if you need further help.
Hi @Xiao9905 @zhc7 ,
I wonder how to add new task into AgentBench. Could you point me where the guide for adding new dataset/task?
Thanks.
Hi @chiyuzhang94 I just added the guide: https://github.com/THUDM/AgentBench/blob/main/docs/Extension_en.md
Hi @chiyuzhang94 I just added the guide: https://github.com/THUDM/AgentBench/blob/main/docs/Extension_en.md
Hi @zhc7 ,
I have prepared the task.py script for my new task. I have created a new yaml file for my new task, added new task yaml to "task_assembly.yaml", and tried to run the task with python -m src.assigner --config configs/assignments/new_task
but I got this error:
File "/home/AgentBench/src/typings/config.py", line 83, in post_validate
assert (
AssertionError: Task new_task is not defined.
I wonder where I should define the task.
Hi @chiyuzhang94 , your steps are correct. Can you show me your config yamls? Including task_assembly.yaml and all the yaml you added or modified.
Hi @chiyuzhang94 , your steps are correct. Can you show me your config yamls? Including task_assembly.yaml and all the yaml you added or modified.
Thanks. Here they are
AgentBench/configs/assignments/spam_email.yaml
import: definition.yaml
concurrency:
task:
spam_email: 5
agent:
gpt-3.5-turbo: 5
assignments: # List[Assignment] | Assignment
- agent: # "task": List[str] | str , "agent": List[str] | str
- gpt-3.5-turbo
task:
- spam_email
output: "outputs/{TIMESTAMP}"
AgentBench/configs/tasks/spam_email.yaml
default:
module: src.server.tasks.spam_email.SpamEmail
parameters:
data_path: "data/spam_email/"
max_step: 5
task_assembly.yaml
default:
docker:
command: umask 0; [ -f /root/.setup.sh ] && bash /root/.setup.sh;
import:
- webshop.yaml
- dbbench.yaml
- mind2web.yaml
- card_game.yaml
- kg.yaml
- os.yaml
- ltp.yaml
- alfworld.yaml
- avalon.yaml
- spam_email.yaml
So the problem here is that you didn't actually defined the task. What you have to do is to change default
to spam_email
in configs/tasks/spam_email.yaml. The logic here is that the assignment config need to have task definitions from task asembly, which imports all the tasks from different configs. You can view import
as something like include
in C, which is more like a copy of the imported file regardless of the file name. The reason why all configs we provided have a default
field is that there are actually several tasks within one file that share some fields. More information can be found in docs/Config_en.md. I hope this solves your problem! @chiyuzhang94
Thanks for the prompt reply. This solved the issue.
Hi @zhc7 ,
I have a question about how to debug. I found that it is hard to interactively debug due to use of multi-process and server. I wonder if you have any experience or suggestions to debug and investigate the outputs in task scripts (e.g., task.py in AgentBench/src/server/tasks/xxxx/).
Thanks.
Hi @zhc7 ,
I have a question about how to debug. I found that it is hard to interactively debug due to use of multi-process and server. I wonder if you have any experience or suggestions to debug and investigate the outputs in task scripts (e.g., task.py in AgentBench/src/server/tasks/xxxx/).
Thanks.
I assume you mean something like attaching a debugger to the process right? I suggest first you set the number of processes to 1. Then you may start a task worker manually which you can attach a debugger. Also, you may add some printings or assertions in your task file to see if everything is working as expected.
Hello Team Is it possible to create a customized test set for a specific task (for example for medical or financial) and use this tool to evaluate fine tune models? Thanks in advance.