abdulhaim / LMRL-Gym

MIT License
64 stars 9 forks source link

Clarification on the Twenty Questions Dataset #12

Closed scottsuk0306 closed 4 months ago

scottsuk0306 commented 5 months ago

Hello,

I've been digging into your paper and had a couple of questions about the data sources for eval.json and train.json. You mentioned that you started with 1,000 conversations using GPT-3 when collecting twenty questions, and then used the two finetuned FLAN-T5-XL models to generate 100K more chats.

So, here's where I'm a bit curious:

  1. With the conversations coming out of the FLAN-T5-XL model, can we assume they're all valid? I'm concerned for some instances having wrong YES / NO answer to a question.
  2. Am I right in thinking that the eval.json is the product of the GPT-3, and train.json is from FLAN-T5-XL's doing?

Would really appreciate your take on these points! Thank you in advance 😄

icwhite commented 4 months ago

Hi, @scottsuk0306 thank you for your questions and apologies for the delayed response.

  1. The answers exact-match GPT-3.5 with near 100% exact-match accuracy (0.995% to be precise), so yes, you may assume they are all valid as long as the questions are within the training distribution.
  2. No, the eval.json is also from FLAN-T5-XL and is used for evaluating RL algorithms on the datasets. Hope this helps! let me know if you have more questions!