THUDM / AgentTuning

AgentTuning: Enabling Generalized Agent Abilities for LLMs
https://thudm.github.io/AgentTuning/
1.36k stars 95 forks source link

关于reward #32

Closed DryPilgrim closed 12 months ago

DryPilgrim commented 1 year ago

请教以下问题,非常感谢您的回答:)

  1. reward既作为模型metric,又作为轨迹筛选标准?!AgentBench中DCG和WS都使用reward来作为模型在该任务上的metric, 但是AgentTunning中是用来筛选交互轨迹的-“Recall that each interaction trajectory receives a reward r, this allows us to automatically select high-quality trajectories based on the reward.”
  2. AgentTunning中有6个held-in任务,他们的reward计算公式是? AgentBench出现了两个reward计算公式,分别用于DCG和WS,都是专门为DCG和WS设计的,不能推广到其他任务,比如DB。
lr-tsinghua11 commented 12 months ago
  1. 对于 Held-in 任务是的,AgentLM在 sft 过程中学习 gpt-4 的高质量交互对话(reward 筛选),并在这些任务上表现不错(reward 评测),同时也能泛化到其余 Held-out 智能体任务上。
  2. 这 6 个 Held-in 任务为 AgentBench 子集,reward 计算方式均能在 AgentBench 论文附录中每个数据集的 Dataset details 中找到
DryPilgrim commented 12 months ago

请教以下问题,非常感谢您的回答:)

  1. AgentBench 论文附录中数据集的 Dataset details 中找不到reward的计算方式!?比如DB的C.1中只是提到”Metrics. We measure the Success Rate of agents in completing instructions.“ 这个不是计算trajectory的reward分数(而且AgentBench中DB数据并没有trajectory)。
  2. AgentBench中DB数据并没有交互轨迹,如何使用CoT with Actions呢?
  3. AgentBench 中为什么#Dev比#Test大呢?如DB的#Dev=60,#Test=300. 训练集比测试集大吗? 》参考如下:
    • AgentBench使用CoT with Actions:
      AgentBench论文第2节中说”Since LLM-as-Agent requires LLMs’ strong reasoning ability, CoT (Wei et al., 2022b), which has been considered a de facto strategy in related evaluation together with actions (Yao et al., 2023b), is also adopted in AGENTBENCH.“
    • AgentBench的DB数据:
      {
      "description": "how many weeks did julie covington's \"don't cry for me argentina\" spend at the top of australia's singles chart?",
      "label": [
      "7"
      ],
      "create": {
      "database": "wikitq",
      "init": "wikitq_init.sql"
      },
      "table": {
      "table_name": "Music Chart History",
      "table_info": {
          "columns": [
              {
                  "name": "#",
                  "type": "INT"
              },
              {
                  "name": "Title",
                  "type": "TEXT"
              },
              {
                  "name": "Artist",
                  "type": "TEXT"
              },
              {
                  "name": "Highest pos. reached",
                  "type": "INT"
              },
              {
                  "name": "weeks at No. 1",
                  "type": "TEXT"
              }
          ],
          "rows": [
              [
                  "1.",
                  "\"Don't Cry for Me Argentina\"",
                  "Julie Covington",
                  "1",
                  "7"
              ],
              [
                  "2.",
                  "\"The Way You That You Do It\"",
                  "Pussyfoot",
                  "1",
                  "7"
              ],
              [
                  "3.",
                  "\"I Just Want to Be Your Everything\"",
                  "Andy Gibb",
                  "1",
                  "7"
              ],
              [
                  "4.",
                  "\"That's Rock and Roll\"",
                  "Shaun Cassidy",
                  "2",
                  ""
              ],
              [
                  "5.",
                  "\"Living Next Door to Alice\"",
                  "Smokie",
                  "2",
                  ""
              ],
              [
                  "6.",
                  "\"I Go To Rio\"",
                  "Peter Allen",
                  "1",
                  "5"
              ],
              [
                  "7.",
                  "\"Torn Between Two Lovers\"",
                  "Mary McGregor",
                  "1",
                  "4"
              ],
              [
                  "8.",
                  "\"Walk Right In\"",
                  "Dr Hook",
                  "1",
                  "5"
              ],
              [
                  "9.",
                  "\"You're Moving Out Today\"",
                  "Carole Bayer Sager",
                  "1",
                  "4"
              ],
              [
                  "10.",
                  "\"If You Leave Me Now\"",
                  "Chicago",
                  "1",
                  "5 (pkd #1 in 76 & 77)"
              ],
              [
                  "11.",
                  "\"Don't Give Up on Us\"",
                  "David Soul",
                  "1",
                  "3"
              ],
              [
                  "12.",
                  "\"Lido Shuffle\" / \"What Can I Say\"",
                  "Boz Scaggs",
                  "2",
                  ""
              ],
              [
                  "13.",
                  "\"You and Me\"",
                  "Alice Cooper",
                  "2",
                  ""
              ],
              [
                  "14.",
                  "\"Dance Little Lady Dance\"",
                  "Tina Charles",
                  "4",
                  ""
              ],
              [
                  "15.",
                  "\"When I Need You\"",
                  "Leo Sayer",
                  "8",
                  ""
              ],
              [
                  "16.",
                  "\"Don't Fall in Love\"",
                  "Ferrets",
                  "2",
                  ""
              ],
              [
                  "17.",
                  "\"I Feel Love\"",
                  "Donna Summer",
                  "1",
                  "1"
              ],
              [
                  "18.",
                  "\"Help is on its Way\"",
                  "Little River Band",
                  "1",
                  "1"
              ],
              [
                  "19.",
                  "\"You Gotta Get Up and Dance\"",
                  "Supercharge",
                  "3",
                  ""
              ],
              [
                  "20.",
                  "\"Mull of Kintyre\"",
                  "Wings",
                  "1",
                  "11 (pkd #1 in 77 & 78)"
              ],
              [
                  "21.",
                  "\"Don't Leave Me This Way\"",
                  "Thelma Houston",
                  "6",
                  ""
              ],
              [
                  "22.",
                  "\"Ain't Gonna Bump No More with No Big Fat Woman\"",
                  "Joe Tex",
                  "2",
                  ""
              ],
              [
                  "23.",
                  "\"You're in My Heart\"",
                  "Rod Stewart",
                  "1",
                  "1"
              ],
              [
                  "24.",
                  "\"Ma Baker\"",
                  "Boney M",
                  "5",
                  ""
              ],
              [
                  "25.",
                  "\"Lucille\"",
                  "Kenny Rogers",
                  "7",
                  ""
              ],
              [
                  "26.",
                  "\"Livin' la Vida Loca\"",
                  "Ricky Martin",
                  "1",
                  "3"
              ],
              [
                  "27.",
                  "\"Smooth\"",
                  "Santana featuring Rob Thomas",
                  "1",
                  "12"
              ],
              [
                  "28.",
                  "\"No Scrubs\"",
                  "TLC",
                  "3",
                  ""
              ],
              [
                  "29.",
                  "\"All Star\"",
                  "Smash Mouth",
                  "4",
                  ""
              ],
              [
                  "30.",
                  "\"Baby One More Time\"",
                  "Britney Spears",
                  "1",
                  "2"
              ],
              [
                  "31.",
                  "\"Say My Name\"",
                  "Destiny's Child",
                  "1",
                  "3"
              ],
              [
                  "32.",
                  "\"Genie in a Bottle\"",
                  "Christina Aguilera",
                  "1",
                  "5"
              ],
              [
                  "33.",
                  "\"Smooth Criminal\"",
                  "Michael Jackson",
                  "7",
                  ""
              ],
              [
                  "34.",
                  "\"I Will Always Love You\"",
                  "Whitney Houston",
                  "1",
                  "10"
              ],
              [
                  "35.",
                  "\"You Are Not Alone\"",
                  "Michael Jackson",
                  "1",
                  "5"
              ]
          ]
      }
      },
      "evaluation": "",
      "example": "",
      "type": [
      "other"
      ],
      "heads": [
      "#",
      "Title",
      "Artist",
      "Highest pos. reached",
      "weeks at No. 1"
      ],
      "add_description": "The name of this table is Music Chart History, and the headers of this table are #,Title,Artist,Highest pos. reached,weeks at No. 1.",
      "sql": {
      "query": "SELECT weeks_at_No_1 FROM `Music Chart History` WHERE Artist = 'Julie Covington' AND Title = 'Don\\'t Cry for Me Argentina';",
      "length": 123
      },
      "source": "wikitq"
      }