PiotrSobczak / speech-emotion-recognition

Multi-modal Speech Emotion Recogniton on IEMOCAP dataset
83 stars 15 forks source link

What's this project about?

The goal if this project is to create a multi-modal Speech Emotion Recogniton system on IEMOCAP dataset.

Project outline

What's IEMOCAP dataset?

IEMOCAP states for Interactive Emotional Dyadic Motion and Capture dataset. It is the most popular database used for multi-modal speech emotion recognition.

Original class distribution:

IEMOCAP database suffers from major class imbalance. To solve this problem we reduce the number of classes to 4 and merge Enthusiastic and Happiness into one class.

Final class distribution

Related works overview

References: [1] [2] [3] [4] [5] [6] [7] [8] [9]

Tested Architectures

Acoustic Architectures

Classifier Architecture Input type Accuracy [%]
Convolutional Neural Network Spectrogram 55.3
Bidirectional LSTM with self-attention LLD features 53.2

Linguistic Architectures

Classifier Architecture Input type Accuracy[%]
LSTM Transcription 58.9
Bidirectional LSTM Transcription 59.4
Bidirectional LSTM with self-attention Transcription 63.1

Ensemble Architectures

Ensemble architectures make use of the most accurate acoustic and linguistic architectures. This means that linguistic model with bidirectional LSTM with self-attention architecture and acoustic model with Convolutional architecture are being used.

Ensemble type Accuracy
Decision-level Ensemble(maximum confidence) 66.7
Decision-level Ensemble(average) 68.8
Decision-level Ensemble(weighted average) 69.0
Feature-level Ensemble 71.1

Feature-level Ensemble Architecture

Feature-level Ensemble Confusion Matrix

How to prepare IEMOCAP dataset?

How to run?

Run hyperparameter tuning

python3 -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram

Run training

python3 -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram

Run ensemble training

python3 -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch

Run evaluation

python3 -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch

How to run in docker?(CPU only)

Run hyperparameter tuning

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram

Run training

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram

Run ensemble training

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch

Run evaluation

docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch