From what I understand there seems to be 3 schools of thought when feeding the models data.
you structure it like
So I guess this is called the multi turn based way, I would believe this is called Next sentence prediction ?
x: <start token>, msg1 y:msg 2 <end of sentence>
x1:<start token> msg1, msg2 y:msg3 <end of sentence>
... so on
Then there is this minor variation assuming the conversation is only 3 messages you only apply EOS at the end of the conversation
Then the teacher forcing way assuming 3 messages
x:<start token> msg1, msg2 y:msg3 <end of sentence>
y: msg1, msg2 ,msg3 <end of sentence>
I am wondering what you guys do with Parlai and which one is better in your oppinons ?
@stephenroller @klshuster
Thanks!
from what I understand the third one allows for longer generations but requires parsing since you can generate your speech.
While the first one only generates next sentence.
From what I understand there seems to be 3 schools of thought when feeding the models data. you structure it like So I guess this is called the multi turn based way, I would believe this is called Next sentence prediction ?
... so on Then there is this minor variation assuming the conversation is only 3 messages you only apply EOS at the end of the conversation
I am wondering what you guys do with Parlai and which one is better in your oppinons ? @stephenroller @klshuster
Thanks! from what I understand the third one allows for longer generations but requires parsing since you can generate your speech. While the first one only generates next sentence.