Open fanyunfeng-bit opened 2 years ago
Thanks for your interest in our work. ‘Film’ is not simple fusion methods like sum or concat, the output of classifier cannot be directly split into two modal independent parts. Hence, we select the uni-modal part before fusion to evaluate the uni-modal contribution. A potential strategy has been provided in the release code. More strategies under such cases are worth to be explored in the future.
Link: https://github.com/GeWu-Lab/OGM-GE_CVPR2022/blob/main/models/fusion_modules.py
Hello,
I also have the same question about calculating the ρ when using flim and gated fusion.
What's the meaning of selecting the uni-modal part before fusion to evaluate the uni-modal contribution? Do you mean to use the unimodal representation before fusion directly? If so, how to calculate the ρ by using the unimodal representation only? Could you give me more details?
The fusion method film,is a little complex compared with sum or concat. And the classifier input dim is same as embedding dim, so the code here may not right. I want to know how you deal with the Film fusion method? how to calculate the imbalance coef ρ?