declare-lab / multimodal-deep-learning

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
MIT License
761 stars 155 forks source link

About IEMOCAP sentence-level audio features #3

Open Luyizhe opened 2 years ago

Luyizhe commented 2 years ago

Hello, Can you share the way you extract audio features in the work "Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis"? I have no idea that how to extract 100 dimensions sentence-level audio features. Thank you !

Luyizhe commented 2 years ago

Hello, I want to try data augmentation, but I don’t have consistent features. By reading your paper "Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos", I find your way to transformer 6373 dimensions to 100 dimensions by using FC layer. But I can't get appropriate matrix weights. Can you share the weights? Thank you!

soujanyaporia commented 2 years ago

We used openSMILE and then fed that to an FC network with 100-dim output. This FC network can be trained using your training dataset's labels. Alternatively you can use other audio features as shown here: https://github.com/soujanyaporia/MUStARD

Penglikai commented 1 year ago

We used openSMILE and then fed that to an FC network with 100-dim output. This FC network can be trained using your training dataset's labels. Alternatively you can use other audio features as shown here: https://github.com/soujanyaporia/MUStARD

Hi, thanks for your clarification. Could you please share the scripts of dimension reduction process? I am trying to replicate the feature extraction but having trouble with the FC network settings for dimension reduction.

BTW, may I know why the librosa feature are with different size for each audio utterance? Thank you!