The interpretation of selfeval in the dataset

facebookresearch / EmpatheticDialogues

Dialogue model that produces empathetic responses when trained on the EmpatheticDialogues dataset.

Other

450 stars 63 forks source link

The interpretation of selfeval in the dataset #21

Closed zhongpeixiang closed 4 years ago

zhongpeixiang commented 4 years ago

What's the interpretation of the selfeval field in the dataset?

For example, what's the meaning of 4|3|4_3|5|5 in

hit:1_conv:2,1,afraid, i used to scare for darkness,2, it feels like hitting to blank wall when i see the darkness,4|3|4_3|5|5,

Thanks

EricMichaelSmith commented 4 years ago

Hi! Those are human ratings of empathy, relevance, and understandability for the conversations themselves in the dataset. We don't show any results about that in our paper.

zhongpeixiang commented 4 years ago

Thank you for the quick and helpful response!

manzar96 commented 4 years ago

Can you please explain which ratings refer to empathy,relevance and understandability? Why we have 2 different ratings?

EricMichaelSmith commented 4 years ago

I believe they're in that order (empathy, relevance, and understandability), but I'm not positive because these ratings were collected before I came onto this part of the project. Similarly, my understanding is that these are ratings from two different people per line, separated by an underscore.

anar-rzayev commented 2 years ago

So, if I understand correctly, they collected two different ratings per line from MTurk workers, but how do they collect Table 8: "Human evaluation metrics from rating task for additional experiments" in the paper? It seems there are only 3 ratings described in that paper.

EricMichaelSmith commented 2 years ago

Hi @anar-rzayev , Table 8 is where we did human evaluations of the models themselves, not human evaluations of the dataset.