Open daichuangye opened 3 months ago
I have the same question. I can't find the definition of the Local Cross-Attention Fusion module mentioned in the paper. In the forward, I only see the simple addition of convolutional features and swin features. Can you explain it?
Hello Author: In the CrossAttention class in the utils.py file, there is only one input parameter x, which actually computes Self-Attention. Is your code inconsistent with the content of your paper?