-
when expanding the number of query heads for GQA architecture, it needs to clone vectors interleaving way not just repeating.
-
when we enable flash attention, hard to debug if the result can not match.
so we'd like to add print for easy debug process.
-
# 🐛 Bug
If i use MemoryEfficientAttentionFlashAttentionOp as my attention op for memory efficient attention, and use attention bias, it will give me errors :(
## Command
```
import math
impor…
-
Hi, I would like to ask about the Deformable Attention mechanism in the paper.
I went to the paper DEFORMABLE DETR: DEFORMABLE TRANSFORMERS
FOR END-TO-END OBJECT DETECTION and the Deformable Atten…
-
### System Info
- `transformers` version: 4.44.2
- Platform: Linux-6.8.0-40-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.6
- Safetensors version: 0.…
-
The Cross-Layer Attention (CLA) proposed by MIT recently can significantly reduce runtime memory usage. Does vLLM have any plans to support it? Thanks!
Cross-Layer Attention paper: https://arxiv.or…
-
### Your current environment
code review
### 🐛 Describe the bug
flash_attn.py
forward func
code:
else:
# prefix-enabled attention
assert prefill_m…
-
When im trying to use Videocrafter 2 - i get this error :
F:\Pinokio\api\videocrafter2.git\app\env\lib\site-packages\torch\nn\functional.py:5560: UserWarning: 1Torch was not compiled with flash att…
-
### Describe the bug
I accidentally introduced a bug in this [PR](https://github.com/huggingface/diffusers/pull/5181) by making a condition on [this line](https://github.com/huggingface/diffusers/blo…
-
Dear Author
I am trying to locate the section of the code that handles the cross-attention layer between text embedding and visual embedding. Could you please guide me to the relevant part of the c…