-
We recently released [FlexAttention](https://pytorch.org/blog/flexattention/), which automatically generates fused flashattention kernels for a diverse range of attention variants.
For example, the…
-
we aim to get 80%+ of XeTLA
use `python/tutorials/06-fused-attention.py` as the test case.
- #912
- #913
- #914
- #915
- #916
- #917
- #1102
- #1103
- #1192
(batch head n_ctx d…
-
### Request description
E2E tests suite for the Attention that has reference implementation in it.
### What component(s) does this issue relate to?
Compiler
### Additional context
I ra…
-
What is the position of the attention module added in the network when you conduct the experiment?
-
My device is a 4090 in hopper architecture, consistent with the h100 architecture. But on the homepage it says “Requirements: H100 / H800 GPU, CUDA >= 12.3.”
I would like to know if flash attentio…
-
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask`…
-
We do not have grouped att for now, @rosenrodt . @asroy Do we need instances for group bmm+softmax+gemm+permute
_Originally posted by @shaojiewang in https://github.com/ROCmSoftwarePlatform/composa…
-
Hi @cubiq,
Since the `SD3 Attention Seeker L/G` node adjusts Clip L and Clip G, does that mean it could also work with SDXL?
I tried it and it does something, but I don't know if it's working p…
-
Hi, thank you for an interesting work :)
I was wondering how the "attention heatmap" in the paper was drawn.
If I have understood your method correctly, the learnable parameters are only added to …
-
Repro
```
import flash_attn
import torch
from einops import rearrange
def snr(a: torch.Tensor, b: torch.Tensor):
if torch.equal(a, b):
return float("inf")
if a.dtype == t…