-
Thanks for your good job! Why this operation (" overwriting the function: _get_unpad_data with a monkey-patched function") can implement the feature of packing without cross-contamination attention? C…
-
As far as i understand, when using cross_attention we first compute `qkv = self.qkv(hidden_states)`, and then `cross_qkv = self.qkv(encoder_output)`. But later only `q` from `qkv` is used, and only `k…
-
Thanks for your great work! I have a question about your implementation:
- In Figure 3, the embeded road map is interacted with Seq Embedding and then added to input feature. However, I find the em…
-
请问这个代码是不全吗,我对图像和文本的特征做cross attention那块有点疑惑,我在代码的基础上加了
def save_attention_hook(module, input, output):
attention_scores = output[1]
module.save_attention_map(attention_scores)
来捕获注意力得分,但是…
-
This issue is not in response to a performance regression.
The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-atten…
-
Hi. Thanks for the great work.
Why is the pos_enc in cross-attention only used for the keys, and not the queries? (see config files)
```
pos_enc_at_cross_attn_keys: true
pos_enc_at_c…
-
Hi,
Thank you for releasing your code. I would like to understand where is the decoupled cross-attention being used in the code, as stated in the paper. In the code, I only say concatenation. I wou…
-
Hello,
Thank you so much for your great work and codebase!
I would appreciate your clarifications on a few items.
1) From within ```TextToVideoSDPipelineCall.py```, at this [line](https://g…
-
First, give a thumbs up to your work. But I have a question. The paper mentions decomposing cross attention into space and channels. What is the difference between these two and why is it called space…
-
Hello, I have recently implemented a cross attention application with multi-modal fusion, but because the image resolution is too large, cuda OOM occurs when calculating q and k, so I found your paper…