google-research / task-oriented-dialogue

Apache License 2.0
61 stars 22 forks source link

Fix multi-frame turn_info shallow copy #6

Open WeixuanZ opened 1 year ago

WeixuanZ commented 1 year ago

An example of a turn with multiple frames is turn 12 of dialogue 8_00001 (in sgd/dev/dialogues_008.json):

      {
        "frames": [
          {
            "actions": [
              {
                "act": "INFORM_INTENT",
                "canonical_values": [
                  "GetCarsAvailable"
                ],
                "slot": "intent",
                "values": [
                  "GetCarsAvailable"
                ]
              }
            ],
            "service": "RentalCars_1",
            "slots": [],
            "state": {
              "active_intent": "GetCarsAvailable",
              "requested_slots": [],
              "slot_values": {}
            }
          },
          {
            "actions": [
              {
                "act": "THANK_YOU",
                "canonical_values": [],
                "slot": "",
                "values": []
              }
            ],
            "service": "Buses_1",
            "slots": [],
            "state": {
              "active_intent": "BuyBusTicket",
              "requested_slots": [],
              "slot_values": {
                "from_location": [
                  "SF",
                  "San Francisco"
                ],
                "leaving_date": [
                  "7th of March",
                  "March 7th"
                ],
                "leaving_time": [
                  "4:30 pm"
                ],
                "to_location": [
                  "Long Beach",
                  "Long Beach, CA"
                ],
                "travelers": [
                  "3"
                ]
              }
            }
          }
        ],
        "speaker": "USER",
        "utterance": "Thanks for all your help. Could you help me find a rental car there?"
      },

Current output:

0=time of bus leaving for journey 1=city where bus is going to 2=fare per ticket for journey 3=station where bus is going to 4=date of bus leaving for journey 5=number of travelers for journey 6=number of transfers in journey 7=city where bus is leaving from 8=station where bus is leaving from i0=find a bus journey for a given pair of cities i1=buy tickets for a bus journey [user] i'd like to get a bus ticket. [system] where are you leaving from and where are you going? how many tickets will you need? [user] i'd like to leave from sf and go to long beach, ca. i need 3 tickets. [system] what day and time are you wanting to leave? [user] i'd like to leave at 4:45 in the evening on the 7th of march. [system] please confirm the following: you want 3 tickets for a bus leaving from san francisco for long beach on march 7th at 4:45 pm. [user] yes, that will work. what station is it going to? [system] it's going to downtown station. i was unable to buy the ticket. does the bus leaving at 4:30 pm work? it costs $42. [user] yes, that will work. [system] the ticket has been bought. [user] what station am i leaving from? [system] the bus is leaving from salesforce transit center. [user] thanks for all your help. could you help me find a rental car there?  [states] 0=4:30 pm 1=long beach | long beach, ca 4=7th of march | march 7th 5=3 7=sf | san francisco    8_00001 12  1
0=time of bus leaving for journey 1=city where bus is going to 2=fare per ticket for journey 3=station where bus is going to 4=date of bus leaving for journey 5=number of travelers for journey 6=number of transfers in journey 7=city where bus is leaving from 8=station where bus is leaving from i0=find a bus journey for a given pair of cities i1=buy tickets for a bus journey [user] i'd like to get a bus ticket. [system] where are you leaving from and where are you going? how many tickets will you need? [user] i'd like to leave from sf and go to long beach, ca. i need 3 tickets. [system] what day and time are you wanting to leave? [user] i'd like to leave at 4:45 in the evening on the 7th of march. [system] please confirm the following: you want 3 tickets for a bus leaving from san francisco for long beach on march 7th at 4:45 pm. [user] yes, that will work. what station is it going to? [system] it's going to downtown station. i was unable to buy the ticket. does the bus leaving at 4:30 pm work? it costs $42. [user] yes, that will work. [system] the ticket has been bought. [user] what station am i leaving from? [system] the bus is leaving from salesforce transit center. [user] thanks for all your help. could you help me find a rental car there?  [states] 0=4:30 pm 1=long beach | long beach, ca 4=7th of march | march 7th 5=3 7=sf | san francisco    8_00001 12  1

Output after this commit:

0=model name of rental car 1=location of rental car pickup 2=date of rental car pickup 3=time of rental car pickup 4=total price of car rental 5=city to pick up the rental car 6=category to which rental car belongs 7=date of rental car drop-off i0=search for available rental cars by city and date i1=reserve car rental for given dates and location [user] i'd like to get a bus ticket. [system] where are you leaving from and where are you going? how many tickets will you need? [user] i'd like to leave from sf and go to long beach, ca. i need 3 tickets. [system] what day and time are you wanting to leave? [user] i'd like to leave at 4:45 in the evening on the 7th of march. [system] please confirm the following: you want 3 tickets for a bus leaving from san francisco for long beach on march 7th at 4:45 pm. [user] yes, that will work. what station is it going to? [system] it's going to downtown station. i was unable to buy the ticket. does the bus leaving at 4:30 pm work? it costs $42. [user] yes, that will work. [system] the ticket has been bought. [user] what station am i leaving from? [system] the bus is leaving from salesforce transit center. [user] thanks for all your help. could you help me find a rental car there?  [states]    8_00001 12  0
0=date of bus leaving for journey 1=station where bus is leaving from 2=city where bus is leaving from 3=time of bus leaving for journey 4=city where bus is going to 5=number of travelers for journey 6=station where bus is going to 7=number of transfers in journey 8=fare per ticket for journey i0=find a bus journey for a given pair of cities i1=buy tickets for a bus journey [user] i'd like to get a bus ticket. [system] where are you leaving from and where are you going? how many tickets will you need? [user] i'd like to leave from sf and go to long beach, ca. i need 3 tickets. [system] what day and time are you wanting to leave? [user] i'd like to leave at 4:45 in the evening on the 7th of march. [system] please confirm the following: you want 3 tickets for a bus leaving from san francisco for long beach on march 7th at 4:45 pm. [user] yes, that will work. what station is it going to? [system] it's going to downtown station. i was unable to buy the ticket. does the bus leaving at 4:30 pm work? it costs $42. [user] yes, that will work. [system] the ticket has been bought. [user] what station am i leaving from? [system] the bus is leaving from salesforce transit center. [user] thanks for all your help. could you help me find a rental car there?  [states] 0=7th of march | march 7th 2=sf | san francisco 3=4:30 pm 4=long beach | long beach, ca 5=3    8_00001 12  1
google-cla[bot] commented 1 year ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

descrip commented 1 year ago

Hi Weixuan, could you give some more info on this issue, and what types of examples are affected? Seems that there is a data generation bug, but it's not clear to me what exactly is the problem. Thanks for your help in finding out these issues!

alexcoca commented 1 year ago

Hi @descrip! My feeling is that @WeixuanZ is pointing out that for turns where there are multiple frames (ie the service changes), the data generation code generates two examples, but, unfortunately, the prefix is the same. What we want is to generate two examples, where the two prefixes correspond to the two services annotated in the frame. Am I right @WeixuanZ?

If my understanding is correct, it means that your data will have a few duplicated examples and miss out some examples when the service changes.

WeixuanZ commented 1 year ago

@descrip thanks for engaging! And @alexcoca is correct.

In the example I included, 8_00001 12 1 is output twice, which happens because the TurnInfo object of dialogue 8_00001 turn 12 frame 0 (service RentalCars_1) is overwritten by that of frame 1 (service Buses_1).

The old code will generate duplicates whenever a turn contains more than one frame, with earlier frames replaced by the final frame.

descrip commented 1 year ago

This makes sense --- thanks again for all the work you two are doing with reproducing D3ST. Is it ok if I just leave this issue open? I don't want the data generation scripts to be different from the ones we used in the paper.