Closed dreichCSL closed 9 months ago
We have SA within each modality and GA across modalities. In our paper writting, we need to give a simple name for such composite attention structure. So we use the name co-attention
. In our paper, we have mentioned that we have tried the symmetric co-attention structure (e.g., SGA-SGA) you expected, but there is no performance improvemnent.
The MCAN paper suggests that SGA (i.e. a guided attention module) is only used for question-guided attention over image content, but not the other way around (image-guided attention over question content). Could the authors please explain why they call this "CO-"attention even though there's no image-guided attention over question content? Or did I misunderstand the paper?
Greatly appreciate a response!