cnlinxi / speech_emotion

Detect emotion from audio
13 stars 12 forks source link

speech_emotion

Detect human emotion from audio.

Refer to some code in Speech_emotion_recognition_BLSTM, thanks a lot.

Get started

environment: Python 3

main dependencies

dataset

Berlin Database of Emotional Speech, you can download and unzip it in data/ folder.

How use it

  1. python train.py

    Train the model. You can skip this because the trained model named "weights_blstm_hyperas_1.h5" has been uploaded. If you want to retrain the model, you will need to extract features from berlin dataset when you first run it. For saving time, the audio feature file named "berlin_db.p" and "berlin_features.p" has uploaded.

  2. python predict.py

    Predict emotion from audio. You should specify the file path of audio to be predicted. For good performance, the audio should be less than 5 second. You will get the result such as

    "the top 2 emotion is: ('happiness', 0.20501734)\ the top 2 emotion is: ('neutral', 0.29067296)"

File structure

More details

Using attention mechanism and a Bi-LSTM. A "weighted pool" is constructed to process frames that are unrelated to emotion.

The silent frame is assigned a small weight. The pooling operation effectively filtering them out. Similarly, according to human emotions, non-slient frames have different weights. The attention model focuses not only on speech energy, but also on the emotional content. Attention mechanism is achieved by logistic regression(softmax).

The correct rate on the verification set is 60.87%.

Reference

S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, U.S.A., Mar. 2017, IEEE, pp. 2227–2231.

Connect

cnmengnan@gmail.com

blog: WinterColor blog

enjoy it