Closed ZKayell closed 2 years ago
Dailydialog dataset is an unbalance dataset. The 'no_emotion' label is about 83% in the whole emotion label. You can run preprocess_dailydialog2.py to check the all encoded emotion label. Notice that the preprocess emotional dict will change in different run! In preprocess_dailydialog2.py line 78, I added some print method:
for i, label in enumerate(all_emotion_labels):
emotion_label_encoder[label] = i
print(str(i) + " " + str(label))
emotion_label_decoder[i] = label
print(str(emotion_label_encoder[label]) + " " + str(emotion_label_decoder[i]))
So the answer is, in preprocessing time, the 'no_emotion' label matched to a specific encoded label x. When it comes to calculate f1 score, original paper masked this label to get a more specific result.
You can try train_daily_feature3.py, in line 193
avg_fscore_w = round(f1_score(labels, preds, average='micro', labels=[0, 1, 2, 4, 5, 6]) * 100, 2)
# Add precision and recall
precision_w = round(precision_score(labels, preds, average='micro', labels=[0, 1, 2, 4, 5, 6]) * 100, 2)
recall_w = round(recall_score(labels, preds, average='micro', labels=[0, 1, 2, 4, 5, 6]) * 100, 2)
print('fscore: {}, precision: {}, recall: {}'.format(avg_fscore_w, precision_w, recall_w))
In my case, the 'no_emotion' label matched to encoded label 3. And don't forget to change the weight at line 356.
loss_weights = torch.FloatTensor([1 / 0.0017,
1 / 0.0034,
1 / 0.1251,
1 / 0.831,
1 / 0.0099,
1 / 0.0177,
1 / 0.0112])
Thank you for your answer! So, should fewer labels have more loss weight? How do you get these weights?
Actually the settings in train_daily_feature3.py is following DialogRNN, which uses micro-f1 and masks the 'no-emotion' label. And the loss weights is also copied from the DialogRNN. The weight is in line 343, train_daily_feature3.py
# 0-3-fear-0.0017
# 1-2-disgust-0.0034
# 2-4-happiness-0.1251
# 3-0-no emotion-0.831
# 4-1-anger-0.0099
# 5-6-surprise-0.0177
# 6-5-sadness-0.0112
First number denotes the encoded label after preprocess and the second number denotes the original label. Change the loss_weights according to your preprocess results.
I get it. Thanks again.
Hello. When I trained the feature files generated by preprocess_dailydialog2.py, f1 score was 82.24, which doesn't make sense. Do you know what caused this? Much appreciated. The details are as follows. [epoch 1 train_loss 0.5816 train_acc 81.85 train_fscore 82.24 valid_loss 0.3864 valid_acc 88.09 valid_fscore 88.51 test_loss 0.5778 test_acc 81.67 test_fscore 82.29 time 473.1s]