Incorrect annotations - Githubissues

chujiezheng / CoMAE

Codes and data for the ACL 2021-Findings paper: CoMAE: A Multi-factor Hierarchical Framework for Empathetic Response Generation

39 stars 6 forks source link

Incorrect annotations #6

Closed leah1985 closed 3 years ago

leah1985 commented 3 years ago

Hey, just wondering if this is considered as mistakes in the training data, the emotion and dialact look a bit odd...

For example,

seeker post: couldn't work or face any social situations for months because of depression and self confidence issues. decided to switch careers and chase a passion of cooking i've always had. met a girl. happy again. it definitely gets better people chin up! seeker em : joy seeker da : agreeing

response: well done! response em : admiration response da : sympathizing

Would you explain why would "well done!" be a sympathetic act? I just came across many cases like these and started to wonder if I didn't understand how to use the data or what.

Please help me understand, thank you!

chujiezheng commented 3 years ago

Hi. The EM and DA were automatically annotated with fine-tuned models. Specially, the DA classifier may suffer from a bit domain gap due to its tuning corpus (EmpatheticDialogues) and the applied corpus (Reddit). Hence there may exist a part of cases that the classifier failed to categorize correctly.

leah1985 commented 3 years ago

Gotcha, thanks for the clarification!

leah1985 commented 3 years ago

by the way, how many data did you actually annotated, just want to get a better idea that how much effort is required to train a model like this if I am looking to create this model for another language Thank you!

chujiezheng commented 3 years ago

You can refer to the original paper: https://aclanthology.org/2021.findings-acl.72.pdf , Section 4.2 and 4.4. I am sorry that I cannot estimate how much data is required to fine-tune a dialog model from a pretrained checkpoint or train it from scratch.

leah1985 commented 3 years ago

ok just went back to the paper section 4.2, for sure you won't know. sorry about my bad question, thanks for your quick response.