Clarification on the Twenty Questions Dataset

abdulhaim / LMRL-Gym

MIT License

64 stars 9 forks source link

Hello,

I've been digging into your paper and had a couple of questions about the data sources for eval.json and train.json. You mentioned that you started with 1,000 conversations using GPT-3 when collecting twenty questions, and then used the two finetuned FLAN-T5-XL models to generate 100K more chats.

So, here's where I'm a bit curious:

With the conversations coming out of the FLAN-T5-XL model, can we assume they're all valid? I'm concerned for some instances having wrong YES / NO answer to a question.
Am I right in thinking that the eval.json is the product of the GPT-3, and train.json is from FLAN-T5-XL's doing?

Would really appreciate your take on these points! Thank you in advance 😄

abdulhaim / LMRL-Gym

Clarification on the Twenty Questions Dataset #12