declare-lab / MELD

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
GNU General Public License v3.0
788 stars 200 forks source link

Baseline results #10

Open stdoo opened 5 years ago

stdoo commented 5 years ago

Hi, I have tried the bc_LSTM baseline with bimodal in emotion classification, but the F1-score and accuracy of 'fear' and 'disgust' are always zero, so I can't reproduce the result in paper.

The command I use:

python baseline.py -classify emotion -modality bimodal -train

The results:

          precision    recall  f1-score   support

       0     0.7322    0.7795    0.7551      1256
       1     0.4799    0.4662    0.4729       281
       2     0.0000    0.0000    0.0000        50
       3     0.2781    0.2019    0.2340       208
       4     0.4813    0.5448    0.5111       402
       5     0.0000    0.0000    0.0000        68
       6     0.3832    0.4377    0.4087       345

The emotion labels:

Emotion - {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}.

Is there something wrong with my understanding?

sanzgiri commented 5 years ago

I get the same results as @stdoo above and am unable to reproduce the results in the paper in using bcLSTM model to classify Emotion with text/audio/text+audio features.

If I do the Sentiment classification, the F1-scores are closer to the values listed in Table 13, but do not match them exactly.

I am curious to know what was done differently in obtaining the results shown in the paper.

sanzgiri commented 5 years ago

I also tried setting the class weights as described in the README: class_weight = {0:4.0, 1:15.0, 2:15.0, 3:3.0, 4:1.0, 5:6.0, 7:3.0} but still not able to reproduce the results

devamanyu commented 5 years ago

Hi,

Thanks for notifying this issue. Previously, we have had users who have been able to recreate our results and improve upon them. Updates in the dependent softwares might be a reason behind this.

Nevertheless, class weights were the main strategy we used in providing the baseline results in the paper. We encourage you to try out other variations of these weights too. Meanwhile, we will try to update them as per new packages.

hollowgalaxy commented 5 years ago

@devamanyu can you provide the versions of the dependent software you are referring to.

Yuri-Kim commented 4 years ago

@devamanyu I get the same problem with @stdoo and I also tried setting the class weight as described in the README (like @sanzgiri ). Can you provide the versions of the dependent softwares?

Aidenfaustine commented 3 years ago

@devamanyu I get the same problem with @stdoo and I also tried setting the class weight as described in the README (like @sanzgiri ). Can you provide the versions of the dependent softwares?

Aidenfaustine commented 3 years ago

Emotion - {'neutral': 0, 'surprise': 1, 'fear': 2, 'sadness': 3, 'joy': 4, 'disgust': 5, 'anger': 6}.

Dear @stdoo ,could you please give me some guidance about how to fix it? Thanks

Aidenfaustine commented 3 years ago

Hi,

Thanks for notifying this issue. Previously, we have had users who have been able to recreate our results and improve upon them. Updates in the dependent softwares might be a reason behind this.

Nevertheless, class weights were the main strategy we used in providing the baseline results in the paper. We encourage you to try out other variations of these weights too. Meanwhile, we will try to update them as per new packages.

@devamanyu , I‘m sorry to disturb you. I'm new to tensorflow, and I don't have cs/ee background. You say the main strategy is to adjust the class weights. However, I don't know how to do it. To be more specific, Which file should I check to modify the weights, what codes do I need to modify?

Thanks.

YananSunn commented 2 years ago

Hi, I have tried the class_weight strategy but found that class_weight in keras must be a dict, not a array. No matter the latest version or the 2.0.2 version your work used. https://github.com/keras-team/keras/blob/576f8fe8e6a21b7094316d36c315c2f6bdb487cc/keras/engine/training.py#L557

I doubt whether you have tried the strategy. If so, could you give a more specific hint? (like how to change the code to use the class_weight strategy) Thanks a lot.