HLTCHKUST / PAML

Personalizing Dialogue Agents via Meta-Learning
MIT License
126 stars 24 forks source link

Persona overlap between train, test and valid #10

Closed robinsongh381 closed 4 years ago

robinsongh381 commented 4 years ago

Hello

Train, test and valid personas (or "tasks" ) are computed by

train = p.get_personas('train')
test = p.get_personas('test')
valid = p.get_personas('valid')

The length of test is 100 which means there are 100 distinct personas. However, 99 of them are present in train and similarly 99 of 99 personas in valid are also present in train

In addition, the difference between valid and test is only by one persona (62) so they are almost the same task

Q1. Why every persona in test and valid are present in train ? I thought data present in train should not appear for both test and valid

Q2. Why do you make valid and test have almost the same personas ?

Thanks

zlinao commented 4 years ago

Hi, we download the dataset from https://www.aclweb.org/anthology/P18-1205/, and follow their split of train, valid and test. According to the author, the persona in test and valid should not be included in the train.