Code for generative transformer

facebookresearch / EmpatheticDialogues

Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.

Other

450 stars 63 forks source link

Code for generative transformer #7

Closed zhongpeixiang closed 5 years ago

zhongpeixiang commented 5 years ago

Do you have any plans to release the code for the generative model using the Transformer. I trained the full Transformer on Reddit dataset but got random responses. I got low cross-entropy loss for my validation set, so I don't know why is the case.

Thanks, Peixiang

EricMichaelSmith commented 5 years ago

Hi there! We may release the code for the generative model at some point, but likely not in the next few months. Hmm - how are you training the full Transformer on Reddit to get random responses?

zhongpeixiang commented 5 years ago

Hi there! We may release the code for the generative model at some point, but likely not in the next few months. Hmm - how are you training the full Transformer on Reddit to get random responses?

Basically I followed this guide to implement the transformer. I got very low validation cross-entropy loss per token (around 0.8) but random test responses using top-k sampling. My model has 20M params. The dataset has around 1M conversations. A few modifications from the guide are:

keep every utterance at a max length of 30 words (pad and clip when necessary) and concatenate all contextual utterances as a single utterance to encode
use pre-trained word embedding for initialization
cross-entropy loss instead of label smoothing, ignoring padding index in the loss computation
no warm-up for Adam optimizer

Thanks, Peixiang

EricMichaelSmith commented 5 years ago

Hi Peixiang! You shouldn't need to re-implement the Transformer - have you used the Transformer provided in this repo (see the training call at https://github.com/facebookresearch/EmpatheticDialogues#pretraining )?

zhongpeixiang commented 5 years ago

Hi Peixiang! You shouldn't need to re-implement the Transformer - have you used the Transformer provided in this repo (see the training call at https://github.com/facebookresearch/EmpatheticDialogues#pretraining )?

Thank you for the guide :).

Have you experimented with training the generative Transformer on EmpatheicDialogue dataset only, i.e., without pre-training on 1.7B Reddit dataset? If yes, how is the quality of responses?

EricMichaelSmith commented 5 years ago

Sure thing =) No, I don't think I've tried that, but I'd expect the quality of responses to be very poor - pre-training exposes the model to millions of examples of human dialogue so that it can more easily capture relevant information during fine-tuning. (See the original BERT paper for an example of this, for instance.)