DialoGPT-medium extends GPT-2-medium by fine-tuning on Reddit data in order to model dialogue.
For this, the eos token is used to mark a speaker change (represented by the [SEP] token in the input, which requires some modifications to get_surprisals.py and tokenizer.py).
@inproceedings{zhang-etal-2020-dialogpt,
title = "{DIALOGPT} : Large-Scale Generative Pre-training for Conversational Response Generation",
author = "Zhang, Yizhe and
Sun, Siqi and
Galley, Michel and
Chen, Yen-Chun and
Brockett, Chris and
Gao, Xiang and
Gao, Jianfeng and
Liu, Jingjing and
Dolan, Bill",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-demos.30",
doi = "10.18653/v1/2020.acl-demos.30",
pages = "270--278"
}
Are you the creator/co-creator of this language model? No.
Are you the creator/co-creator of this implementation of this language
model? No.
What licensing restrictions (if any) apply to this implementation of this
language model? MIT License, Copyright (c) Microsoft Corporation.
Training
What corpus was this model trained on?
147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, in total 1.8 billion words, vocabulary of 50,257 (see Zhang et al. (2020) for details)
What task was this model trained on?
It extends the Hugging Face PyTorch transformer with a next word prediction objective with an additional Mutual Information Maximization objective
If possible, provide some standard performance measures (e.g. test perplexity)
and complexity measures (e.g. parameter count, number of layers, etc.).
Parameters: 345M, Layers: 24, Embedding size: 1024, Batch size: 64
See Tables 2 & 3 in Zhang et al. (2020) for performance measures
Model
DialoGPT-medium extends GPT-2-medium by fine-tuning on Reddit data in order to model dialogue. For this, the
eos token
is used to mark a speaker change (represented by the[SEP]
token in the input, which requires some modifications toget_surprisals.py
andtokenizer.py
).Training
What corpus was this model trained on? 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, in total 1.8 billion words, vocabulary of 50,257 (see Zhang et al. (2020) for details)
What task was this model trained on? It extends the Hugging Face PyTorch transformer with a next word prediction objective with an additional Mutual Information Maximization objective
If possible, provide some standard performance measures (e.g. test perplexity) and complexity measures (e.g. parameter count, number of layers, etc.). Parameters: 345M, Layers: 24, Embedding size: 1024, Batch size: 64 See Tables 2 & 3 in Zhang et al. (2020) for performance measures
Licensing
MIT License