I've been digging into your paper and had a couple of questions about the data sources for eval.json and train.json. You mentioned that you started with 1,000 conversations using GPT-3 when collecting twenty questions, and then used the two finetuned FLAN-T5-XL models to generate 100K more chats.
So, here's where I'm a bit curious:
With the conversations coming out of the FLAN-T5-XL model, can we assume they're all valid? I'm concerned for some instances having wrong YES / NO answer to a question.
Am I right in thinking that the eval.json is the product of the GPT-3, and train.json is from FLAN-T5-XL's doing?
Would really appreciate your take on these points! Thank you in advance 😄
Hi, @scottsuk0306 thank you for your questions and apologies for the delayed response.
The answers exact-match GPT-3.5 with near 100% exact-match accuracy (0.995% to be precise), so yes, you may assume they are all valid as long as the questions are within the training distribution.
No, the eval.json is also from FLAN-T5-XL and is used for evaluating RL algorithms on the datasets. Hope this helps!
let me know if you have more questions!
Hello,
I've been digging into your paper and had a couple of questions about the data sources for
eval.json
andtrain.json
. You mentioned that you started with 1,000 conversations using GPT-3 when collecting twenty questions, and then used the two finetuned FLAN-T5-XL models to generate 100K more chats.So, here's where I'm a bit curious:
Would really appreciate your take on these points! Thank you in advance 😄