Audio-Visual-Emotion-and-Sentiment-Research

Deep Neural Network and its application with TensorFlow project

Members and Corresponding parts:

Audio parts: Enis Berk Çoban and Yunhua Zhao

We use the audio-song, audio-speech, audio-song-speech files separately to train models to detect emotion. Also we use different NN architectures (LSTM, CNNs) along with pre-trained models for transfer learning (VGGish).

Video parts: Patrick Jean-Baptiste

We use images of actors' faces that express an emotion. The images are extracted from the video only and audio-visual files of the RAVDESS dataset for both speech and song. The objective is to create a visual model to recognize emotions from images.

Period of our project:

Explore and pre-process the dataset:\ Enis extractes audio from video;\ Yunhua decode the filenames;\ Patrick extract images from the video files
\ Yunhua use LSTM to train models on audio-song files to get the accuracy; then use same model to the audio-speech files to get the accuracy; then use same model to the audio-song-speech model.\ Enis generated VGGish embeddings to train the model on audio-song-speech files.\ Patrick does initial video preprocessing.
\ Enis split the dataset into train, validation, test sets, and made a csv file, so that everyone could use the same sets for training and we can merge models or their outputs.\
\ Patrick detects and extracts the actors' faces from the images.
\ Enis tried reproducing Yunhua's results and discovered a bug,\ Yunhua tried Enis' model output with Yunhua's model;\ Patrick trains a model on the split dataset to do the visual emotion classification
\ Enis provided several organized files and leads us to move on.
\ Enis trained a model which has a module for each type of input. Yunhua merged the original features of audio and video and put the merged features to one dense layers model.
We prepared the presentation.

File organization:

We use the "Issues" to track some problems such as
- Creating dataset splits
- How to merge audio and video features?
Everybody completed their experiments on notebooks such as:
We created some scripts that handles data processing functions such as:
We also have a folder for assets such as models or embedding files:
- assets ReadMe
Our email list
dataset info

Yunhua468 / Audio-Visual-Emotion-and-Sentiment-Research

readme

Audio-Visual-Emotion-and-Sentiment-Research

Members and Corresponding parts:

Period of our project:

File organization: