About the Generative Prepend Models

facebookresearch / EmpatheticDialogues

Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.

Other

444 stars 63 forks source link

About the Generative Prepend Models #22

Closed yuboxie closed 4 years ago

yuboxie commented 4 years ago

Hi,

I was wondering if the generative prepend models (EmoPrepend and TopicPrepend) involve any pre-trained BERT weights? From my understanding, it seems that you first trained the prepend models on Reddit and then fine-tuned them on ED, right? And for prepend models, you only experimented with 4-layer transformers but not 5-layer (denoted as "Large" in the paper)?

Another related question would be, when you trained the prepend models on Reddit, you still predicted the labels based on the input context and prepended them in the front, am I correct?

Thanks for your time! Yubo

EricMichaelSmith commented 4 years ago

Sorry for the delay! Must have missed this issue. To address your points:

All of the BERT models used pretrained weights from the pytorch-pretrained-bert package, and then they were trained on Reddit/ED.
Yes, unfortunately we don't have any 5-layer Transformer results for the prepend models.
The code for prepending is in empchat.py, around line 130 - yes, the labels were predicted based on the input context and then prepended to that context.

YuanEric88 commented 4 years ago

Hi, I also have a problem for the generative model. In the paper, you used the full-transformer model for pre-train and fine-tune. I am wondering another generative model - GPT2. Since you haven't released the fine-tuned full-transformer generative model and I don't have enough resources to replicate your outcome for comparison, I would like to ask:

From your perspective, if I use ED to fine-tune GPT2 model, what will be the performance for that(both automated metric and human ratings)? will there be a sacrifice compared with full-transformer model? Since there is no encoder parts in GPT2, but there are multiple layer of decoders. Thanks

EricMichaelSmith commented 4 years ago

See my response at https://github.com/facebookresearch/EmpatheticDialogues/issues/27