TmacMai / Multimodal-Information-Bottleneck

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations (MIB for multimodal sentiment analysis)
MIT License
54 stars 4 forks source link

Some questions #4

Closed GalioMax closed 2 years ago

GalioMax commented 2 years ago

After reading your paper, I am very excited because it gave me a lot of inspiration. But I am a little confused about some of the details in the text, and I hope you can answer it in your spare time. 1、I(y;z^m) appears in the objective functions of both L_MIB and C_MIB, but according to Figure 1 and Figure 3 in the paper, does y here represent y^m, where m∈{a,v,l },instead of the final predicted output? 2、In your paper, there is a task on emotion recognition, using the MOSEI and IEMOCAP datasets, but no relevant documents were found in the code you provided. Would you like to provide the relevant documents for these two datasets? I am looking forward to your reply, thank you!

GalioMax commented 2 years ago

I have another question. In your article, we need to maximize the objective function R=I(z;y)-βI(z;x), then it means that we need to maximize the first term and minimize the second term. Can we use MINE to maximize the first term, and CLUB to minimize the second term to maximize the objective function R? Looking forward to your reply, thank you!

TmacMai commented 2 years ago
  1. Here y denotes the true label. In our datasets, we dont have unimodal labels for each modality, so we turn to maximize the mutual information between the label for the multimodal sample and the encoded representation of each modality (z^m).

  2. We will release the CMU-MOSEI dataset soon (I am quite busy right now). The experiment on IEMOCAP dataset is conducted on another platform, so it might take more time to release the code.

TmacMai commented 2 years ago

I have another question. In your article, we need to maximize the objective function R=I(z;y)-βI(z;x), then it means that we need to maximize the first term and minimize the second term. Can we use MINE to maximize the first term, and CLUB to minimize the second term to maximize the objective function R? Looking forward to your reply, thank you!

Of course you can. You can use any objectives satisfying the constraints.

GalioMax commented 2 years ago
  1. Here y denotes the true label. In our datasets, we dont have unimodal labels for each modality, so we turn to maximize the mutual information between the label for the multimodal sample and the encoded representation of each modality (z^m).
  2. We will release the CMU-MOSEI dataset soon (I am quite busy right now). The experiment on IEMOCAP dataset is conducted on another platform, so it might take more time to release the code.

Thank you very much for your helpful answer. Thank you very much

TmacMai commented 2 years ago

We have provided the link to download the mosei dataset.