paper: What Do They "Meme"? A Metaphor-aware Multi-modal Multi-task Framework for Fine-grained Meme Understanding
we leverage metaphorical information as text modality and propose a Metaphor-aware Multi-modal Multi-task Framework (M3F) for fine-grained meme understanding. Specifically, we create intra-modality attention enlightened by the Transformer to capture inter-modality interaction between text and image. Moreover, intra-modality attention is applied to model the contradiction between the text and metaphorical information. To learn the implicit interaction among different tasks, we introduce a multi-interactive decoder that exploits gating networks to establish the relationship between various subtasks.