ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
500 stars 37 forks source link

The performance of utterance-level features is poor. #33

Open 15HoneyMoon opened 2 weeks ago

15HoneyMoon commented 2 weeks ago

Hello, I'm new to this field. I'd like to ask you why I got a poor result when I used the utterance-level provided by you for emotion recognition, and the WA was probably over 60.I also only use the linear layer as the basemodel. I am looking forward to your answer, thank you.

ddlBoJack commented 2 weeks ago

Which model did you use and which dataset did you test?

15HoneyMoon commented 2 weeks ago

I use the emotion2vec model, the dataset is IEMOCAP, and the features used are the utterance-level features in the links you provide.

15HoneyMoon commented 2 weeks ago

Which model did you use and which dataset did you test? I use the emotion2vec model, the dataset is IEMOCAP, and the features used are the utterance-level features in the links you provide.

ddlBoJack commented 2 weeks ago

HI, pay attention there are different settings for IEMOCAP dataset. The mainstream setting is to train a model with standard 4 emotions classification with 5531 utterances. The features are provided here https://github.com/ddlBoJack/emotion2vec/tree/main/iemocap_downstream (you can also extract with emotion2vec by yourself), which we reported in the paper. We also provide the features of the whole IEMOCAP dataset here https://github.com/ddlBoJack/emotion2vec?tab=readme-ov-file#extract-features-from-your-dataset

15HoneyMoon commented 2 weeks ago

嗨,请注意IEMOCAP数据集有不同的设置。主流设置是训练具有标准 4 种情绪分类的模型,其中包含 5531 个话语。这些功能在这里提供 https://github.com/ddlBoJack/emotion2vec/tree/main/iemocap_downstream(您也可以自己用emotion2vec提取),我们在论文中报道了这一点。我们还在这里提供了整个IEMOCAP数据集的功能 https://github.com/ddlBoJack/emotion2vec?tab=readme-ov-file#extract-features-from-your-dataset OK,I also used 5531 utterances from the IEMOCAP dataset for 4 emotions classification and don't know what the problem is. I'll use emotion2vec to extract the features myself and try to see if the performance can be improved to the one in the paper.Thank you!!