cszmli / Dp-without-Adv

Guided dialogue policy learning without adversarial learning in the loop
9 stars 4 forks source link

Guided Dialogue Policy Learning without Adversarial Learning in the Loop

This is the codebase for paper: "Guided dialogue policy learning without adversarial learning in the loop".

There are two different parts in this work: (1)training a reward model with GAN-VAE, and (2)using the trained reward function to guide dialogue policy learing in ConvLab.

For the training of each part, please go to the corresponding folder.

If you use the code for dialogue policy learning, feel free to cite our publication Guided dialogue policy learning without adversarial learning in the loop:

@article{li2020guided,
  title={Guided Dialog Policy Learning without Adversarial Learning in the Loop},
  author={Li, Ziming and Lee, Sungjin and Peng, Baolin and Li, Jinchao and Kiseleva, Julia and de Rijke, Maarten and Shayandeh, Shahin and Gao, Jianfeng},
  journal={Findings of EMNLP 2020},
  year={2020}
}