Hello author, you have inspired me a lot about the paper. Inside your abstract, you mentioned that DCMHA can be used as a direct replacement for MHA in any transformer architecture for the corresponding DCFormer, but I would like to know how this is a replacement?
Hello author, you have inspired me a lot about the paper. Inside your abstract, you mentioned that DCMHA can be used as a direct replacement for MHA in any transformer architecture for the corresponding DCFormer, but I would like to know how this is a replacement?