GeWu-Lab / OGM-GE_CVPR2022

The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
MIT License
221 stars 18 forks source link

problem about how to process Film #20

Open fanyunfeng-bit opened 1 year ago

fanyunfeng-bit commented 1 year ago

The fusion method film,is a little complex compared with sum or concat. And the classifier input dim is same as embedding dim, so the code here may not right. image I want to know how you deal with the Film fusion method? how to calculate the imbalance coef ρ?

echo0409 commented 1 year ago

Thanks for your interest in our work. ‘Film’ is not simple fusion methods like sum or concat, the output of classifier cannot be directly split into two modal independent parts. Hence, we select the uni-modal part before fusion to evaluate the uni-modal contribution. A potential strategy has been provided in the release code. More strategies under such cases are worth to be explored in the future.

Link: https://github.com/GeWu-Lab/OGM-GE_CVPR2022/blob/main/models/fusion_modules.py

shicaiwei123 commented 1 year ago

Hello,

I also have the same question about calculating the ρ when using flim and gated fusion.

What's the meaning of selecting the uni-modal part before fusion to evaluate the uni-modal contribution? Do you mean to use the unimodal representation before fusion directly? If so, how to calculate the ρ by using the unimodal representation only? Could you give me more details?