circlePi / knowledge-driven-dialogue-lic2019

2019 语言与智能技术竞赛-知识驱动对话 B榜第5名源码和模型
25 stars 4 forks source link

knowledge-driven-dialogue-2019-lic

2019语言与智能技术竞赛知识驱动对话 B榜第5名方案
由于线上部署对时间有要求,最终提交人工评估的版本删掉了一些全局主题特征,导致模型结果有所下降,最终人工评估第9名。A榜第四 B榜第五

Overview

For building a proactive dialogue chatbot, we used a so-called generation-reranking method. First, the generative models(Multi-Seq2Seq) produce some candidate replies. Next, the re-ranking model is responsible for performing query-answer matching, to choice a reply as informative as possible over the produced candidates. A detailed paper to describle our solution is now avaliable at https://arxiv.org/pdf/1907.03590.pdf, please check.

Data Augmentation

We used four data augmentation techniques, Entity Generalization,Knowledge Selection,Switch,Conversation Extraction to construct multiple different dataset for training Seq2Seq models. One can use the scripts Seq2Seq/preclean_*.py to with slight modification of parameters to get 6 datasets.

Seq2Seq Model

For ensemble purpose we choose different encoders and decoders, i.e. LSTM cells and the Transformer.

Training