facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.49k stars 2.1k forks source link

Question: about how the data is fed in for Decoder only models #4931

Closed ArEnSc closed 1 year ago

ArEnSc commented 1 year ago

From what I understand there seems to be 3 schools of thought when feeding the models data. you structure it like So I guess this is called the multi turn based way, I would believe this is called Next sentence prediction ?

x: <start token>, msg1 y:msg 2 <end of sentence>
x1:<start token> msg1, msg2  y:msg3 <end of sentence>

... so on Then there is this minor variation assuming the conversation is only 3 messages you only apply EOS at the end of the conversation

x: <start token>, msg1 y:msg 2 
x1:<start token> msg1, msg2  y:msg3 <end of sentence>
Then the teacher forcing way assuming 3 messages

x:<start token> msg1, msg2  y:msg3 <end of sentence>

y: msg1, msg2 ,msg3 <end of sentence>

I am wondering what you guys do with Parlai and which one is better in your oppinons ? @stephenroller @klshuster

Thanks! from what I understand the third one allows for longer generations but requires parsing since you can generate your speech. While the first one only generates next sentence.

ArEnSc commented 1 year ago

nevermind I asked chatgpt and it let me know I was on the right track, it also seems to be arbitrary