The goal if this project is to create a multi-modal Speech Emotion Recogniton system on IEMOCAP dataset.
IEMOCAP states for Interactive Emotional Dyadic Motion and Capture dataset. It is the most popular database used for multi-modal speech emotion recognition.
Original class distribution:
IEMOCAP database suffers from major class imbalance. To solve this problem we reduce the number of classes to 4 and merge Enthusiastic and Happiness into one class.
Final class distribution
References: [1] [2] [3] [4] [5] [6] [7] [8] [9]
Classifier Architecture | Input type | Accuracy [%] |
---|---|---|
Convolutional Neural Network | Spectrogram | 55.3 |
Bidirectional LSTM with self-attention | LLD features | 53.2 |
Classifier Architecture | Input type | Accuracy[%] |
---|---|---|
LSTM | Transcription | 58.9 |
Bidirectional LSTM | Transcription | 59.4 |
Bidirectional LSTM with self-attention | Transcription | 63.1 |
Ensemble architectures make use of the most accurate acoustic and linguistic architectures. This means that linguistic model with bidirectional LSTM with self-attention architecture and acoustic model with Convolutional architecture are being used.
Ensemble type | Accuracy |
---|---|
Decision-level Ensemble(maximum confidence) | 66.7 |
Decision-level Ensemble(average) | 68.8 |
Decision-level Ensemble(weighted average) | 69.0 |
Feature-level Ensemble | 71.1 |
python3 -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram
python3 -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram
python3 -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch
python3 -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch
docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_hyperparameter_tuning -m acoustic-spectrogram
docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -m acoustic-spectrogram
docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_training_ensemble -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch
docker run -t -v /path/to/project/data:/data -v /path/to/project/saved_models:/saved_models -v /tmp:/tmp speech-emotion-recognition -m speech_emotion_recognition.run_evaluate -a /path/to/acoustic_spec_model.torch -l /path/to/linguistic_model.torch -e /path/to/ensemble_model.torch