Caiyun-AI / DCFormer

MIT License
185 stars 15 forks source link

DCMHattention #3

Closed Wangjinhong1998 closed 5 months ago

Wangjinhong1998 commented 5 months ago

Congratulations on your significant breakthrough. I have a question for you: Can the DCMH attention mechanism mentioned in this article be applied to the transformer structure used in models like CLIP? I noticed that its implementation seems rather complex, and I have not been able to fully grasp it yet...

hilbertmeng commented 5 months ago

Thank you for your recognition. Yes, image-encoder(ViT) and text-encoder(Text Transformer) in CLIP can also adopt DCMHA. It is independent of attention mask, and can be incorporated into both causal attention(GPT-like) and self-attention(BERT-like).