Query regarding data preparation.

facebookresearch / EmpatheticDialogues

Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.

Other

444 stars 63 forks source link

Query regarding data preparation. #37

Closed kunalpagarey closed 3 years ago

kunalpagarey commented 4 years ago

Hi @EricMichaelSmith, I was going through the data preparation for ED dataset in empchat.py file and found that the speaker utterance is also taken as label with prev conversations utterances as context. I am a little confused as to why would you take speaker utterance as a label when you want a response in listener role only?

Is data preparation different for generation and retrieval tasks?

Please clarify this. Thank you.

EricMichaelSmith commented 4 years ago

Hi! You might want to use the Speaker utterance as a label if you're training a model on both Speaker and Listener utterances, as we do in the paper. (Empirically, we found that results were slightly better at least for the retrieval model when we did so.)

kunalpagarey commented 3 years ago

Hi @EricMichaelSmith, I am assuming that the model BERT, fine-tuned provided here https://dl.fbaipublicfiles.com/parlai/empatheticdialogues/models/bert_finetuned.mdl is trained in both the roles. Please let me know if that is correct.

Thank you.

EricMichaelSmith commented 3 years ago

Yes @kunalpagarey that's correct.