ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
623 stars 44 forks source link

Why I only get 5 classes of emotion #45

Closed Mahaotian1 closed 1 week ago

Mahaotian1 commented 2 months ago

According to the official description, the motion2vec_plus model should be able to categorize speech into 9 classes, but why did I actually run it with only 5 classes(angry、happy、neutral、sad and )?

usage: inference_pipeline = pipeline( task=Tasks.emotion_recognition, model="iic/emotion2vec_plus_base", device = "gpu:1") rec_result = inference_pipeline("/home/data2/tts_data/raw/opensrc/Emotion Speech Dataset/0007/Neutral/0007_000239.wav", output_dir="./cosyvoice_embedding", granularity="utterance", extract_embedding=False)

LorantSzaboOxit commented 1 month ago

emotion2vec_plus_seed/base/large works with 5 classes (angry, happy, neutral, sad, unk)

emotion2vec/base/large/base_finetuned works with 9 classes (angry, disgusted, fearful, happy, neutral, other, sad, surprised, unknown)

if i'm right

ddlBoJack commented 1 month ago

You can use this pipeline to get 9-class labels:

from funasr import AutoModel

model = AutoModel(model="iic/emotion2vec_plus_large")

wav_file = f"{model.model_path}/example/test.wav"
rec_result = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(rec_result)
Mahaotian1 commented 1 month ago

You can use this pipeline to get 9-class labels:

from funasr import AutoModel

model = AutoModel(model="iic/emotion2vec_plus_large")

wav_file = f"{model.model_path}/example/test.wav"
rec_result = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
print(rec_result)

it works,thanks a lot