alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview
Apache License 2.0
279 stars 25 forks source link

DST #10

Closed smartyfh closed 3 years ago

smartyfh commented 3 years ago

Hi,

Thanks for the nice work. As the proposed model is based on TripPy when evaluating dialogue state tracking, I wonder how the MultiWOZ is preprocessed. I found that TripPy may change some ground-truth labels using the original preprocessing script (e.g., the time-slot value "10:30" may be changed to "10" only.), so if convenient, could you please do a simple comparison between the ground-truth labels before and after the preprocessing? Since you are launching a leaderboard, I hope that the evaluation could be as precise as possible. Thanks!

mihail-amazon commented 3 years ago

Thanks for the note. We will look into that effect once we get some time. In the meantime, please keep us posted if you make any interesting observations based on this!