Closed startreker-shzy closed 4 months ago
OK, we would check it.
For SER evaluation, we baned the EMO_UNK token and only score the utterance with happy, sad, angry and neutral
We now add a ban_emo_unk
parameter to ban it. You can upgrade the code and decode again as follow:
res = m.inference(
data_in="1001_IEO_HAP_MD.wav",
language="auto", # "zn", "en", "yue", "ja", "ko", "nospeech"
use_itn=False,
ban_emo_unk=True,
**kwargs,
)
print(res[0][0]["text"])
The result will be <|en|><|HAPPY|><|Speech|><|woitn|>it's eleven o'clock
I have tested the emotion recognition with crema_d dataset. But most of the output is EMO_UNKNOWN. I think It is not match the below table. Any threshold should be change or need finetune with emotion data?