-
Hi! Thank you for your great work. I was looking at the code and I see that deformable attention is only used in the cross-attention Decoder module.
Why is deformable attention not used anywhere e…
-
It is already being reproduced. Can you provide some training logs for reference?
-
Hello,
Thank you for your excellent work on this project!
While reviewing the code, I noticed a few discrepancies between the implementation and the manuscript's description, specifically in the…
-
Hi, I am aware that implementation and source code of kernels like FMHA is not released. However, is there a guide or some reference I can use to create custom kernels related to attention? I would id…
-
![image](https://github.com/YifanXu74/MQ-Det/assets/76269294/a9a166c8-abce-42f6-85f0-50bdf097f2d0)
In the paper,there are two cross-Attn in the right of Fig(1). What can GCP benefit from this structu…
-
-
## Description
onnx: https://drive.google.com/file/d/1JgwgwIl71BnJRw2e9FtgV0DGSGzLy0OZ/view?usp=sharing
I tried to use fp32 for MatMul in self-attention and cross-attention layer on A100, but it …
-
Hi,
I'd like to know how did you visualize the 2D and 3D heatmaps in "Figure.8 Motion-word cross-attention visualization" in your paper.
The attention matrix in [CrossAttention module](https://githu…
-
Thanks for your great work! I wonder that the length of text usually less than 77. Why not mask the padding tokens in word_emb when performing cross attention?
-
Recently, I have been studying this paper and I am not very clear about formula 4 in the paper. The spatial weight map M is generated for each type of manipulation operation. Therefore, if there are s…