This repository contains the official code for developing an online emotion recognition classifier using audio-visual modalities and deep reinforcement learning techniques introduced here.
Combined with corresponding repositories for preprocessing unimodal and multi-modal emotional datasets, like AffectNet, IEMOCAP, RML, BAUM-1, to produce the papers results.
Preprocessing codes for AffectNet, IEMOCAP and RML are provided by the authors, here, here and here, respectively.
If you find this repository useful in your research, please consider citing:
@article{kansizoglou2019active,
title={An Active Learning Paradigm for Online Audio-Visual Emotion Recognition},
author={Kansizoglou, Ioannis and Bampis, Loukas and Gasteratos, Antonios},
journal={IEEE Transactions on Affective Computing},
year={2019}
}
Vggish weights converted to PyTorch from this repository and included in ./data/weights/
path with name pytorch_vggish.pth
.
Provided code is tested in Python 3.7.4 and Pytorch 1.4.0.
TO BE UPDATED
The params.json
sets the training hyper-parameters, the exploited modality from the set {"audio", "visual", "fusion"}
and the name of the speaker that is subtracted from the training dataset for evaluation. Note that Leave-One-Speaker-Out and Leave-One-Speakers-Group-Out schemes are adopted. The names of the two files shall be training_data.csv and evaluation_data.csv
The following models are trained through two .csv files, including the paths of the training and evaluation samples, respectively. Those files shall be stored inside ./data/speaker_folder
, where speaker_folder
shall be given to the "speaker"
variable in the params.json
file.
Run python3 main.py train
or simply python3 main.py
to train the model.
In order to test the model on the validation data run python3 main.py test
.