-
Hi,
In the paper for Gated Multimodal Fusion you use a bit different formula than the one you have in the code?
for example concatenation between img_new_resize and tweet_new_resize became sum in t…
-
Hello I would like to do some experiments using ALBEF model. For this I reviewed your paper as well, but I am unable to understand why first six layers of bert base was used as text encoder and why la…
-
When I fuse rgb and audio ,the Ap of your paper is 78.64%. But if I use three multimodal, the AP is worse than your paper. In principle, more modal fusion effects will be better,the fact is not. I am …
-
Hi, I am going to submit my paper about semantic segmentation. I am wondering which subject should I choose. Could you please share you choice about the SUBJECT AREAS with me?
Subject Areas:
Deep …
-
Hi,
We used this config to train AVE task on a 3090, and we used the procesed data you provided, but the accuracy we got is 73.31
python3 /code/AVE/main_trans.py --Adapter_downsample=8 --batch_siz…
-
Are there any ways to bypass the data-preprocessing step for MBT ("Attention Bottlenecks for Multimodal Fusion") if I only wanna do inference without passing in the actual data from AS? I notice the m…
-
Hello,
I'm working on reproduce the results in your paper "Attention Bottlenecks for Multimodal Fusion" and try to implement MBT for other audiovisual video classification tasks.
However, the pr…
-
-
您好,感谢您精彩的工作,能否方便提供一下基于FMB数据集的可复现的权重文件呢?十分感谢!
-
Hello, I'm trying to apply OGM-GE strategy to multimodal fusion network with text, video and audio modalities(e.g. MISA, MAG). However, when I use SGD optimizer, the model training process moves on wi…