MILVLG / mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering
Apache License 2.0
442 stars 88 forks source link

Co-Attention? #44

Closed dreichCSL closed 9 months ago

dreichCSL commented 1 year ago

The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?

Greatly appreciate a response!

MIL-VLG commented 9 months ago

We have SA within each modality and GA across modalities. In our paper writting, we need to give a simple name for such composite attention structure. So we use the name co-attention. In our paper, we have mentioned that we have tried the symmetric co-attention structure (e.g., SGA-SGA) you expected, but there is no performance improvemnent.