GeWu-Lab / OGM-GE_CVPR2022

The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
MIT License
237 stars 19 forks source link

problem about how to process Film #20

Open fanyunfeng-bit opened 2 years ago

fanyunfeng-bit commented 2 years ago

The fusion method film,is a little complex compared with sum or concat. And the classifier input dim is same as embedding dim, so the code here may not right. image I want to know how you deal with the Film fusion method? how to calculate the imbalance coef ρ?

echo0409 commented 2 years ago

Thanks for your interest in our work. ‘Film’ is not simple fusion methods like sum or concat, the output of classifier cannot be directly split into two modal independent parts. Hence, we select the uni-modal part before fusion to evaluate the uni-modal contribution. A potential strategy has been provided in the release code. More strategies under such cases are worth to be explored in the future.

Link: https://github.com/GeWu-Lab/OGM-GE_CVPR2022/blob/main/models/fusion_modules.py

shicaiwei123 commented 1 year ago

Hello,

I also have the same question about calculating the ρ when using flim and gated fusion.

What's the meaning of selecting the uni-modal part before fusion to evaluate the uni-modal contribution? Do you mean to use the unimodal representation before fusion directly? If so, how to calculate the ρ by using the unimodal representation only? Could you give me more details?