Help regarding dataset.

HarshitSoni1903 commented 5 years ago

I would like to know more about how the text data available on NTCIR Short Text Conversation Task(STC-3) Chinese Emotional Conversation Generation (CECG) Subtask (http://coai.cs.tsinghua.edu.cn/hml/challenge/dataset_description/) was processed to the 4 files: category: target sentence emotion category choice: target sentence emotional word annotation source: source sentence target: target sentence For which i looked at https://github.com/AaronYALai/Seq2seqAttn_ECM as well. Hence, more information or guidance would be a great help. since this will help me in processing English dataset as well. Regards

1YCxZ commented 5 years ago

Hi,I will give you a simple explain about this 4 files.

Fist of all, this model is for single turn dialogue, which means we have two sentences here, One is source sentence and the other is target sentence, just like ask and answer.I split the dialogues into 2 files, they are source.txt and target.txt. The i-th line in source.txt is corresponding to the i-th line in target.txt and the same for the category.txt and the choice.txt.

1.category: target sentence emotion category This file needs an emotion classifier model to label the emotion type of the target sentence, such as happy, angry, sad. Then, I map the emotion type to numbers.

2.choice: target sentence emotional word annotation This file needs an emotional word dictionary to label the words in a target sentence.If one word is an emotional word we label it as 1 else 0. For example: I am very happy . 0 0 0 1 0

3.source: source sentence The source sentence in a dialogue.

4.target: target sentence The target sentence in a dialogue.

HarshitSoni1903 commented 5 years ago

Hey thank you so much for the help, now I'm able to understand the data atleast. So for the processing of data is there any specific module that you are using? or the https://github.com/AaronYALai/Seq2seqAttn_ECM/tree/master/emotionregressor module is to be tweaked?

1YCxZ commented 5 years ago

I haven't use his emotion classifier so I'm not quite sure. However, I think you can try his module, it seems works well.

HarshitSoni1903 commented 5 years ago

So what module do you suggest me to use? because that module does not create emotional word dictionary or uses it in any way. I am trying to fit the model on another dataset that has been separated to post.txt response.txt (which have their emotion attributes pre-attached to them in JSON as well as CSV format).

1YCxZ commented 5 years ago

OK, what you said means that you don't need a emotion classifier. So, now you just need an English emotional word dictionary to get the file choice.txt. I think you can find one on the Internet, since I get one Chinese emotional word dictionary using Google search.

HarshitSoni1903 commented 5 years ago

Okay thanks, will look for it, and I guess after that it's just searching for a word, if emotion word exists it is 1, else 0. Which classifier did you use, because my dataset was manually annotated by the providers so i'll keep it, just in case.

1YCxZ commented 5 years ago

Sorry, I can't provide you my classifier. You can use this https://github.com/AaronYALai/Seq2seqAttn_ECM/tree/master/emotionregressor emotion classifier. You can also try to use BERT as a emotion classifier.

1YCxZ / ECM-seq2seq

Help regarding dataset. #3