aiden200 / 2D3MF

Code and models for the paper "2D3MF: Deepfake Detection using Multi Modal Middle Fusion"
Other
31 stars 1 forks source link

Pre-train Resnet for emotion detection #16

Closed adrianSRoman closed 7 months ago

adrianSRoman commented 7 months ago

The task is to perform pre-training on a ResNet model on a simple emotion detection classification task. The input to the ResNet should be MFCCs computed using 1second of audio with a sampling rate of sr=44100Hz and n_mfcc=10 i.e. you can uselibrosa.feature.mfcc(y=y_1sec_audio, sr=44100, n_mfcc=10).

The network can be trained with audioclips from the RAVDESS dataset. The labels to predict should be 8: 01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised

This pre-trained network will then be used as a feature extractor within our 2D3MF pipeline.

aromanusc commented 7 months ago

Filename identifiers

Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
Vocal channel (01 = speech, 02 = song).
Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the 'neutral' emotion.
Statement (01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door").
Repetition (01 = 1st repetition, 02 = 2nd repetition).
Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).
aromanusc commented 7 months ago

Developed under https://github.com/aiden200/2D3MF/tree/user/steve/resnet