budzianowski / multiwoz

Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)
MIT License
867 stars 199 forks source link

Add widely used metric: combined score in the leaderboard #84

Closed yxuansu closed 3 years ago

yxuansu commented 3 years ago

Add the widely used combine score (combined score = 0.5 * (Success + Inform) + BLEU) metric into the response generation leaderboard, serving as the overall quality measurement of different systems.

jianguoz commented 3 years ago

@yxuansu Thanks very much for your great contribution. I have checked the modifications and will merge them. Since the corresponding tables are not sorted based on the ranking of the combined score, I will also reorganize them later.