ED architecture? - Githubissues

17521121 commented 4 years ago

As I understand, ED used

bert tokenizer to embed
and use embedding output as bert encoder input,
bert encoder try to minimized negative loglikelihood of y and y^ , in this case, y^ is the responses ground truth for each input y and x, y is response predicted through bert encoder model? > is that right?

and another phase is generative base I marked it like a bert decoder - because bert doen't have a tokenizer decoder , so we train a transformer like a decoder to get a sentence from bert encoder output?

I also mention before that transformer has many architure right now (huggingface), so it makes confuse to everybody come up with this method. Hope you answer these questions

EricMichaelSmith commented 4 years ago

Hi there! Yes, that's how the BERT encoder works for retrieval. We didn't do a BERT generative model, only a fully Transformer generative model.

17521121 commented 4 years ago

Alright I got it, thank you very much.

EricMichaelSmith commented 4 years ago

Sure thing!

facebookresearch / EmpatheticDialogues

ED architecture? #30