Closed zhongpeixiang closed 5 years ago
Hi there! We may release the code for the generative model at some point, but likely not in the next few months. Hmm - how are you training the full Transformer on Reddit to get random responses?
Hi there! We may release the code for the generative model at some point, but likely not in the next few months. Hmm - how are you training the full Transformer on Reddit to get random responses?
Basically I followed this guide to implement the transformer. I got very low validation cross-entropy loss per token (around 0.8) but random test responses using top-k sampling. My model has 20M params. The dataset has around 1M conversations. A few modifications from the guide are:
Thanks, Peixiang
Hi Peixiang! You shouldn't need to re-implement the Transformer - have you used the Transformer provided in this repo (see the training call at https://github.com/facebookresearch/EmpatheticDialogues#pretraining )?
Hi Peixiang! You shouldn't need to re-implement the Transformer - have you used the Transformer provided in this repo (see the training call at https://github.com/facebookresearch/EmpatheticDialogues#pretraining )?
Thank you for the guide :).
Have you experimented with training the generative Transformer on EmpatheicDialogue dataset only, i.e., without pre-training on 1.7B Reddit dataset? If yes, how is the quality of responses?
Sure thing =) No, I don't think I've tried that, but I'd expect the quality of responses to be very poor - pre-training exposes the model to millions of examples of human dialogue so that it can more easily capture relevant information during fine-tuning. (See the original BERT paper for an example of this, for instance.)
Do you have any plans to release the code for the generative model using the Transformer. I trained the full Transformer on Reddit dataset but got random responses. I got low cross-entropy loss for my validation set, so I don't know why is the case.
Thanks, Peixiang