FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
3.56k stars 320 forks source link

Emotion output EMO_UNKNOWN #31

Closed startreker-shzy closed 4 months ago

startreker-shzy commented 4 months ago

I have tested the emotion recognition with crema_d dataset. But most of the output is EMO_UNKNOWN. image I think It is not match the below table. image Any threshold should be change or need finetune with emotion data?

GizGaze commented 4 months ago

I've got the same result with MELD and RAVDESS datasets. If you come across any results for SER, I would greatly appreciate it if you could share them with me.

LauraGPT commented 4 months ago

OK, we would check it.

gaochangfeng commented 4 months ago

For SER evaluation, we baned the EMO_UNK token and only score the utterance with happy, sad, angry and neutral We now add a ban_emo_unk parameter to ban it. You can upgrade the code and decode again as follow:

res = m.inference(
            data_in="1001_IEO_HAP_MD.wav",
            language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
            use_itn=False,
            ban_emo_unk=True,
            **kwargs,
        )
print(res[0][0]["text"])

The result will be <|en|><|HAPPY|><|Speech|><|woitn|>it's eleven o'clock