关于TRAJECTORY FILTERING问题

QingChengLineOne commented 11 months ago

是6个数据集都用reward作为指标进行筛选？还是像agentbench里面的那样，os用SR，KG用F1，DCG用reward

Dhaizei commented 11 months ago

你试了吗 agentlm7b在agent-bench做测试，比如HH里面效果怎么样？

QingChengLineOne commented 11 months ago

你试了吗 agentlm7b在agent-bench做测试，比如HH里面效果怎么样？

还没有，我目前对怎么进行TRAJECTORY FILTERING比较困惑

lr-tsinghua11 commented 11 months ago

和 AgentBench 采取的指标一样，具体如下	Task	Description	Example	Reward
ALFWorld	Daily Household Routines	Heat food	Success Rate	If task is finished, r=1, otherwise r=0
WebShop	Online Shopping	Buy a shirt	Reward	Score for selecting the correct item during shopping
Mind2Web	Website Navigation	Book a ticket	Step Success Rate	Evaluate the predicted action correctness compared to reference actions.
KG	Retrieve Entity from KG	Which team won the 2014 AFC Championship Game?	F1	Compare the model’s predicted answers to the gold standard answers
DB	Database Operations	How many games did the badgers play in october?	Step Success	If MySQL query is correct, r=1, otherwise r=0
OS	Interacting with OS	Count specific files	Step Success	If result from operating system is correct, r=1, otherwise r=0

THUDM / AgentTuning