-
https://arxiv.org/abs/1812.01243
-
I have seen many users in the community encounter compilation and build problems, and many failures, especially when using new versions of CUDA, PyTorch, or wanting to update CUTLASS or Flash Attentio…
-
Namespace(confidence_threshold=0.2, config_file='configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py', input=['…
-
MLA(Multi-Head Latency Attention) was proposed in [DeepSeek-v2](https://github.com/deepseek-ai/DeepSeek-V2/blob/main/deepseek-v2-tech-report.pdf) for efficient inference.
-
# 🐛 Bug
torch.jit.trace breaks with the following error:
`RuntimeError: unsupported output type: int, from operator: xformers::efficient_attention_forward_generic`
The output of the ops conta…
-
### 🐛 Describe the bug
Looks like it's dispatching to efficient attention backward and failing one of the shape checks (
```
TORCH_CHECK(
max_seqlen_k
-
- [pdf](https://arxiv.org/pdf/2209.07484.pdf)
-
Hello, I am using `torch.cuda.amp.autocast` with `bfloat16`.
I noticed that the xformers `RotaryEmbedding` produces `float32` outputs, which then requires casting before passing to `memory_efficien…
-
### 🚀 The feature, motivation and pitch
Enable support for Flash Attention Memory Efficient and SDPA kernels for AMD GPUs.
At present using these gives below warning with latest nightlies (torch==…
-
From https://arxiv.org/abs/2112.05682v2. I have no immediate use for this, but it looks cool and I didn't want it to go unmentioned in case some aspiring contributor to Transformers.jl is looking for …