Query about additional column in valid and test

rrajasek95 commented 4 years ago

Hello, I was going through the dataset and I noticed that the valid and test sets have an additional column containing candidate responses. I don't see this additional column being referenced in the dataloader.

The original paper mentions that the candidates were sampled from three sources - the training set, dailydialogs and reddit conversations and the validation code seems to do exactly that.

I'm not sure if I'm misreading the code or if this repo's evaluation code does not represent the current state of how training and evaluation is done for this dataset. Can I get some clarity on this? Thanks in advance for your help!

EricMichaelSmith commented 4 years ago

Hi! Yes, each response in the validation set contains 100 candidate responses drawn randomly from the validation set (including the gold response), and similarly for the test set.

kunalpagarey commented 3 years ago

Hi @EricMichaelSmith, Are we supposed to use these provided candidates to evaluate and match the P@1,100 given in the paper?

Thanks and regards, Kunal Pagarey

EricMichaelSmith commented 3 years ago

Yes @kunalpagarey - as far as I remember, those should be the candidates used to match the paper numbers.

kunalpagarey commented 3 years ago

@EricMichaelSmith You really reply quickly thank you so much 😀

facebookresearch / EmpatheticDialogues

Query about additional column in valid and test #29