Open WeixuanZ opened 1 year ago
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
Hi Weixuan, could you give some more info on this issue, and what types of examples are affected? Seems that there is a data generation bug, but it's not clear to me what exactly is the problem. Thanks for your help in finding out these issues!
Hi @descrip! My feeling is that @WeixuanZ is pointing out that for turns where there are multiple frames (ie the service changes), the data generation code generates two examples, but, unfortunately, the prefix is the same. What we want is to generate two examples, where the two prefixes correspond to the two services annotated in the frame. Am I right @WeixuanZ?
If my understanding is correct, it means that your data will have a few duplicated examples and miss out some examples when the service changes.
@descrip thanks for engaging! And @alexcoca is correct.
In the example I included, 8_00001 12 1
is output twice, which happens because the TurnInfo
object of dialogue 8_00001
turn 12 frame 0 (service RentalCars_1
) is overwritten by that of frame 1 (service Buses_1
).
The old code will generate duplicates whenever a turn contains more than one frame, with earlier frames replaced by the final frame.
This makes sense --- thanks again for all the work you two are doing with reproducing D3ST. Is it ok if I just leave this issue open? I don't want the data generation scripts to be different from the ones we used in the paper.
An example of a turn with multiple frames is turn 12 of dialogue
8_00001
(insgd/dev/dialogues_008.json
):Current output:
Output after this commit: