Closed robinsongh381 closed 4 years ago
Hi, we download the dataset from https://www.aclweb.org/anthology/P18-1205/, and follow their split of train, valid and test. According to the author, the persona in test and valid should not be included in the train.
Hello
Train, test and valid personas (or "tasks" ) are computed by
The length of
test
is 100 which means there are 100 distinct personas. However, 99 of them are present intrain
and similarly 99 of 99 personas invalid
are also present intrain
In addition, the difference between
valid
andtest
is only by one persona (62) so they are almost the same taskQ1. Why every persona in
test
andvalid
are present intrain
? I thought data present intrain
should not appear for bothtest
andvalid
Q2. Why do you make
valid
andtest
have almost the same personas ?Thanks