OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.
https://openbmb.github.io/ToolBench/
Apache License 2.0
4.59k stars 398 forks source link

为什么要将多轮交互的样本拆分来训练? #109

Closed dingguo1996 closed 9 months ago

dingguo1996 commented 10 months ago

感谢作者开源ToolBench代码和数据。通常大家对多轮对话样本的训练会将assistant内容调整loss weight=1,其他诸如system、user等内容调整loss weight=0的方式来训练对话样本。像请教一下为什么在Toolbench上要将多轮交互的样本拆分来训练?

例如这里的代码Taget是每个样本的最后一轮assistant的内容:https://github.com/OpenBMB/ToolBench/blob/master/toolbench/train/train.py#L126

在数据toollama_G123_dfs_train.json中有很多相同instruction,同一条解决路径只是处于不同step的样本。如下: Step 1: I need to create a meme for my family reunion. Can you generate a Sponge Bob meme with the text 'Family love' on it? Additionally, I would like to generate a Pet Pet meme GIF using a family photo as the input image. ... Step 10: I need to create a meme for my family reunion. Can you generate a Sponge Bob meme with the text 'Family love' on it? Additionally, I would like to generate a Pet Pet meme GIF using a family photo as the input image.

这个是为了shuffle同一个conversation的不同Step样本来防止模型过度拟合最后进入模型的样本吗?

pooruss commented 10 months ago

您好,感谢提出问题~

我们对样本拆开进行训练,主要是因为DFS算法造出来的数据特性,因为算法中有回退和结点扩张步骤,每一轮的对话的历史对话可能发生改变,以G3中3_ChatGPT_DFS_woFilter_w2.json数据的第二和第三轮对话为例:

      [
        {
          "role": "system",
          "content": "You are AutoGPT, you can use many tools(functions) to do the following task.\nFirst I will give you the task description, and your task start.\nAt each step, you need to give your thought to analyze the status now and what to do next, with a function call to actually excute your step.\nAfter the call, you will get the call result, and you are now in a new state.\nThen you will analyze your status now, then decide what to do next...\nAfter many (Thought-call) pairs, you finally perform the task, then you can give your finial answer.\nRemember: \n1.the state change is irreversible, you can't go back to one of the former state, if you want to restart the task, say \"I give up and restart\".\n2.All the thought is short, at most in 5 sentence.\n3.You can do more then one trys, so if your plan is to continusly try some conditions, you can do one of the conditions per try.\nLet's Begin!\nTask description: You should use functions to help handle the real time user querys. Remember:\n1.ALWAYS call \"Finish\" function at the end of the task. And the final answer should contain enough information to show to the user,If you can't handle the task, or you find that function calls always fail(the function is not valid now), use function Finish->give_up_and_restart.\n2.Do not use origin tool names, use only subfunctions' names.\nYou have access of the following tools:\n1.investors_exchange_iex_trading: The IEX API is a is a free, web-based API supplying IEX quoting and trading data.\n2.the_cocktail_db: Cocktail recipes with high-quality images stored on AWS S3 for fast accessibility. Easy to use!Category: Food\n3.web_search: Billions of webpages, images and news with a single API call.  Visit us at: usearch.com\n4.live_sports_odds: Odds data API for NFL, NBA, MLB, EPL, AFL and loads more, from US, UK, EU and Aussie bookmakers. Get started for free!\n"
        },
        {
          "role": "user",
          "content": "\nI'm planning a family vacation to a beach destination. Can you recommend some refreshing cocktails to enjoy by the beach? Also, provide me with the historical open and close prices for Qualcomm's stock. Furthermore, suggest some sports activities that can be enjoyed on the beach.\nBegin!\n"
        },
        {
          "role": "assistant",
          "content": null,
          "function_call": {
            "name": "list_of_cocktails_for_the_cocktail_db",
            "arguments": "{}"
          }
        },
        {
          "role": "function",
          "name": "list_of_cocktails_for_the_cocktail_db",
          "content": "{\"error\": \"\", \"response\": \"[{\\\"id\\\":\\\"1\\\",\\\"title\\\":\\\"Nutella and cognac coffee cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/1.jpg\\\"},{\\\"id\\\":\\\"2\\\",\\\"title\\\":\\\"Easy rhubarb cordial\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/2.jpg\\\"},{\\\"id\\\":\\\"3\\\",\\\"title\\\":\\\"Bottled chocolate orange negroni\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/3.jpg\\\"},{\\\"id\\\":\\\"4\\\",\\\"title\\\":\\\"Pickled bramble martini\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/4.jpg\\\"},{\\\"id\\\":\\\"5\\\",\\\"title\\\":\\\"Toasted marshmallow hot-buttered rum\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/5.jpg\\\"},{\\\"id\\\":\\\"6\\\",\\\"title\\\":\\\"Sangria\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/6.jpg\\\"},{\\\"id\\\":\\\"7\\\",\\\"title\\\":\\\"Umibozu yuzu cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/cocta..."
        },
        {
          "role": "user",
          "content": "This is not the first time you try this task, all previous trails failed.\nBefore you generate my thought for this state, I will first show you your previous actions for this state, and then you must generate actions that is different from all of them. Here are some previous actions candidates:\n[\n  {\n    \"name\": \"detailed_cocktail_recipe_by_id_for_the_cocktail_db\",\n    \"arguments\": \"{\\n  \\\"is_id\\\": \\\"6\\\"\\n}\",\n    \"function_output\": \"{\\\"error\\\": \\\"request invalid, data error. status_code=500\\\", \\\"response\\\": \\\"\\\"}\",\n    \"mento-carlo-action-value\": 0.0\n  }\n]\nagain, your former observation: {\"error\": \"\", \"response\": \"[{\\\"id\\\":\\\"1\\\",\\\"title\\\":\\\"Nutella and cognac coffee cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/1.jpg\\\"},{\\\"id\\\":\\\"2\\\",\\\"title\\\":\\\"Easy rhubarb cordial\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/2.jpg\\\"},{\\\"id\\\":\\\"3\\\",\\\"title\\\":\\\"Bottled chocolate orange negroni\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/3.jpg\\\"},{\\\"id\\\":\\\"4\\\",\\\"title\\\":\\\"Pickled bramble martini\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/4.jpg\\\"},{\\\"id\\\":\\\"5\\\",\\\"title\\\":\\\"Toasted marshmallow hot-buttered rum\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/5.jpg\\\"},{\\\"id\\\":\\\"6\\\",\\\"title\\\":\\\"Sangria\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/6.jpg\\\"},{\\\"id\\\":\\\"7\\\",\\\"title\\\":\\\"Umibozu yuzu cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/cocta...\n\nRemember you are now in the intermediate state of a trail, you will first analyze the now state and previous action candidates, then make actions that is different from all the previous.",
          "valid": false
        },
        {
          "role": "assistant",
          "content": "Based on the previous action, it seems that the attempt to retrieve the detailed recipe for cocktail ID 6 failed. Since the previous attempt was unsuccessful, I will try a different approach this time.\n\nThought:\n1. I will use the \"list_of_cocktails_for_the_cocktail_db\" function to get a list of cocktails available.\n2. From the list, I will select a cocktail to recommend for enjoying by the beach.\n3. Next, I will use the \"ohlc_for_investors_exchange_iex_trading\" function to retrieve the historical open and close prices for Qualcomm's stock.\n4. Finally, I will use the \"v4_sports_for_live_sports_odds\" function to suggest sports activities that can be enjoyed on the beach.\n\nActions:\n1. Call the \"list_of_cocktails_for_the_cocktail_db\" function.\n2. Call the \"ohlc_for_investors_exchange_iex_trading\" function with symbol \"QCOM\".\n3. Call the \"v4_sports_for_live_sports_odds\" function.\n\n",
          "function_call": {
            "name": "list_of_cocktails_for_the_cocktail_db",
            "arguments": "{}"
          }
        }
      ],
      [
        {
          "role": "system",
          "content": "You are AutoGPT, you can use many tools(functions) to do the following task.\nFirst I will give you the task description, and your task start.\nAt each step, you need to give your thought to analyze the status now and what to do next, with a function call to actually excute your step.\nAfter the call, you will get the call result, and you are now in a new state.\nThen you will analyze your status now, then decide what to do next...\nAfter many (Thought-call) pairs, you finally perform the task, then you can give your finial answer.\nRemember: \n1.the state change is irreversible, you can't go back to one of the former state, if you want to restart the task, say \"I give up and restart\".\n2.All the thought is short, at most in 5 sentence.\n3.You can do more then one trys, so if your plan is to continusly try some conditions, you can do one of the conditions per try.\nLet's Begin!\nTask description: You should use functions to help handle the real time user querys. Remember:\n1.ALWAYS call \"Finish\" function at the end of the task. And the final answer should contain enough information to show to the user,If you can't handle the task, or you find that function calls always fail(the function is not valid now), use function Finish->give_up_and_restart.\n2.Do not use origin tool names, use only subfunctions' names.\nYou have access of the following tools:\n1.investors_exchange_iex_trading: The IEX API is a is a free, web-based API supplying IEX quoting and trading data.\n2.the_cocktail_db: Cocktail recipes with high-quality images stored on AWS S3 for fast accessibility. Easy to use!Category: Food\n3.web_search: Billions of webpages, images and news with a single API call.  Visit us at: usearch.com\n4.live_sports_odds: Odds data API for NFL, NBA, MLB, EPL, AFL and loads more, from US, UK, EU and Aussie bookmakers. Get started for free!\n"
        },
        {
          "role": "user",
          "content": "\nI'm planning a family vacation to a beach destination. Can you recommend some refreshing cocktails to enjoy by the beach? Also, provide me with the historical open and close prices for Qualcomm's stock. Furthermore, suggest some sports activities that can be enjoyed on the beach.\nBegin!\n"
        },
        {
          "role": "assistant",
          "content": null,
          "function_call": {
            "name": "list_of_cocktails_for_the_cocktail_db",
            "arguments": "{}"
          }
        },
        {
          "role": "function",
          "name": "list_of_cocktails_for_the_cocktail_db",
          "content": "{\"error\": \"\", \"response\": \"[{\\\"id\\\":\\\"1\\\",\\\"title\\\":\\\"Nutella and cognac coffee cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/1.jpg\\\"},{\\\"id\\\":\\\"2\\\",\\\"title\\\":\\\"Easy rhubarb cordial\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/2.jpg\\\"},{\\\"id\\\":\\\"3\\\",\\\"title\\\":\\\"Bottled chocolate orange negroni\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/3.jpg\\\"},{\\\"id\\\":\\\"4\\\",\\\"title\\\":\\\"Pickled bramble martini\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/4.jpg\\\"},{\\\"id\\\":\\\"5\\\",\\\"title\\\":\\\"Toasted marshmallow hot-buttered rum\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/5.jpg\\\"},{\\\"id\\\":\\\"6\\\",\\\"title\\\":\\\"Sangria\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/6.jpg\\\"},{\\\"id\\\":\\\"7\\\",\\\"title\\\":\\\"Umibozu yuzu cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/cocta..."
        },
        {
          "role": "assistant",
          "content": "Based on the previous action, it seems that the attempt to retrieve the detailed recipe for cocktail ID 6 failed. Since the previous attempt was unsuccessful, I will try a different approach this time.\n\nThought:\n1. I will use the \"list_of_cocktails_for_the_cocktail_db\" function to get a list of cocktails available.\n2. From the list, I will select a cocktail to recommend for enjoying by the beach.\n3. Next, I will use the \"ohlc_for_investors_exchange_iex_trading\" function to retrieve the historical open and close prices for Qualcomm's stock.\n4. Finally, I will use the \"v4_sports_for_live_sports_odds\" function to suggest sports activities that can be enjoyed on the beach.\n\nActions:\n1. Call the \"list_of_cocktails_for_the_cocktail_db\" function.\n2. Call the \"ohlc_for_investors_exchange_iex_trading\" function with symbol \"QCOM\".\n3. Call the \"v4_sports_for_live_sports_odds\" function.\n\n",
          "function_call": {
            "name": "list_of_cocktails_for_the_cocktail_db",
            "arguments": "{}"
          }
        },
        {
          "role": "function",
          "name": "list_of_cocktails_for_the_cocktail_db",
          "content": "{\"error\": \"\", \"response\": \"[{\\\"id\\\":\\\"1\\\",\\\"title\\\":\\\"Nutella and cognac coffee cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/1.jpg\\\"},{\\\"id\\\":\\\"2\\\",\\\"title\\\":\\\"Easy rhubarb cordial\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/2.jpg\\\"},{\\\"id\\\":\\\"3\\\",\\\"title\\\":\\\"Bottled chocolate orange negroni\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/3.jpg\\\"},{\\\"id\\\":\\\"4\\\",\\\"title\\\":\\\"Pickled bramble martini\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/4.jpg\\\"},{\\\"id\\\":\\\"5\\\",\\\"title\\\":\\\"Toasted marshmallow hot-buttered rum\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/5.jpg\\\"},{\\\"id\\\":\\\"6\\\",\\\"title\\\":\\\"Sangria\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/6.jpg\\\"},{\\\"id\\\":\\\"7\\\",\\\"title\\\":\\\"Umibozu yuzu cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/cocta..."
        },
        {
          "role": "user",
          "content": "This is not the first time you try this task, all previous trails failed.\nBefore you generate my thought for this state, I will first show you your previous actions for this state, and then you must generate actions that is different from all of them. Here are some previous actions candidates:\n[\n  {\n    \"name\": \"ohlc_for_investors_exchange_iex_trading\",\n    \"arguments\": \"{\\n  \\\"symbol\\\": \\\"QCOM\\\"\\n}\",\n    \"function_output\": \"{\\\"error\\\": \\\"\\\", \\\"response\\\": \\\"An API key is required to access this data and no key was provided\\\"}\",\n    \"mento-carlo-action-value\": 0.0\n  }\n]\nagain, your former observation: {\"error\": \"\", \"response\": \"[{\\\"id\\\":\\\"1\\\",\\\"title\\\":\\\"Nutella and cognac coffee cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/1.jpg\\\"},{\\\"id\\\":\\\"2\\\",\\\"title\\\":\\\"Easy rhubarb cordial\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/2.jpg\\\"},{\\\"id\\\":\\\"3\\\",\\\"title\\\":\\\"Bottled chocolate orange negroni\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/3.jpg\\\"},{\\\"id\\\":\\\"4\\\",\\\"title\\\":\\\"Pickled bramble martini\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/4.jpg\\\"},{\\\"id\\\":\\\"5\\\",\\\"title\\\":\\\"Toasted marshmallow hot-buttered rum\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/5.jpg\\\"},{\\\"id\\\":\\\"6\\\",\\\"title\\\":\\\"Sangria\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/coctails_api/6.jpg\\\"},{\\\"id\\\":\\\"7\\\",\\\"title\\\":\\\"Umibozu yuzu cocktail\\\",\\\"difficulty\\\":\\\"Easy\\\",\\\"image\\\":\\\"https://apipics.s3.amazonaws.com/cocta...\n\nRemember you are now in the intermediate state of a trail, you will first analyze the now state and previous action candidates, then make actions that is different from all the previous.",
          "valid": false
        },
        {
          "role": "assistant",
          "content": "Based on the analysis of the current state and the previous action, which failed due to the lack of an API key, I will formulate a new plan to recommend refreshing cocktails, retrieve historical stock prices for Qualcomm, and suggest beach sports activities.\n\nThought:\n1. I will use the \"list_of_cocktails_for_the_cocktail_db\" function to get a list of cocktails available.\n2. From the list, I will select a cocktail to recommend for enjoying by the beach.\n3. Next, I will use the \"Finish\" function to provide the recommendation and move on to the next step.\n4. Then, I will use the \"ohlc_for_investors_exchange_iex_trading\" function to retrieve the historical open and close prices for Qualcomm's stock.\n5. Finally, I will use the \"v4_sports_for_live_sports_odds\" function to suggest sports activities that can be enjoyed on the beach.\n\nActions:\n1. Call the \"list_of_cocktails_for_the_cocktail_db\" function.\n2. Select a cocktail recommendation and call the \"Finish\" function with the final answer.\n3. Call the \"ohlc_for_investors_exchange_iex_trading\" function with the symbol \"QCOM\".\n4. Call the \"v4_sports_for_live_sports_odds\" function.",
          "function_call": {
            "name": "list_of_cocktails_for_the_cocktail_db",
            "arguments": "{}"
          }
        }
      ],

示例中上一轮和下一轮的历史对话结构发生改变,所以为了简化数据处理和代码迁移,我们将数据样本拆开进行训练;

是的,我们会对所有数据所有step进行shuffle