THUDM / AgentTuning

AgentTuning: Enabling Generalized Agent Abilities for LLMs
https://thudm.github.io/AgentTuning/
1.36k stars 95 forks source link

关于TRAJECTORY FILTERING问题 #44

Closed QingChengLineOne closed 11 months ago

QingChengLineOne commented 11 months ago

是6个数据集都用reward作为指标进行筛选?还是像agentbench里面的那样,os用SR,KG用F1,DCG用reward

Dhaizei commented 11 months ago

你试了吗 agentlm7b在agent-bench做测试,比如HH里面效果怎么样?

QingChengLineOne commented 11 months ago

你试了吗 agentlm7b在agent-bench做测试,比如HH里面效果怎么样?

还没有,我目前对怎么进行TRAJECTORY FILTERING比较困惑

lr-tsinghua11 commented 11 months ago
和 AgentBench 采取的指标一样,具体如下 Task Description Example Reward Reward Calculation
ALFWorld Daily Household Routines Heat food Success Rate If task is finished, r=1, otherwise r=0
WebShop Online Shopping Buy a shirt Reward Score for selecting the correct item during shopping
Mind2Web Website Navigation Book a ticket Step Success Rate Evaluate the predicted action correctness compared to reference actions.
KG Retrieve Entity from KG Which team won the 2014 AFC Championship Game? F1 Compare the model’s predicted answers to the gold standard answers
DB Database Operations How many games did the badgers play in october? Step Success If MySQL query is correct, r=1, otherwise r=0
OS Interacting with OS Count specific files Step Success If result from operating system is correct, r=1, otherwise r=0