Closed gaopeng-eugene closed 7 years ago
You are talking about the experimental results of table1, I guess. https://arxiv.org/pdf/1705.06676.pdf
To point out the performance variation due to the fusion modules, we first compare MUTAN to state-of-the-art bilinear models, under the same experimental framework. We do not use attention models here.
So in those experiments image features are of size 2048 instead of 14x14x2048.
What is the performance of MLB+attn in your code?
You can download the zip file.
Why your MLB baseline is so low?