-
First, thank you for creating and releasing this invaluable resource.
# What I am trying to do
I would like to combine `kfax-jax` with [fused attention from `pallas`](https://github.com/google/…
-
-
Hi,
Thank you for your great work! It's really helpful in my research.
I'm interested in using NATTEN with linear attention, which can be simplified as ```(q@k) @ v -> q@(k@v)```. This approach …
-
Hi,
I got this issue on mac:
```
/custom_nodes/comfyui-oms-diffusion/oms_diffusion_nodes.py", line 149, in get_area_and_mult
conditioning["c_attn_stored_area"] = AttnStoredExtra(torch.te…
-
https://research.colfax-intl.com/flashattention-3-fast-and-accurate-attention-with-asynchrony-and-low-precision/
cc @yzh119
-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) …
-
Repro
```
import flash_attn
import torch
from einops import rearrange
def snr(a: torch.Tensor, b: torch.Tensor):
if torch.equal(a, b):
return float("inf")
if a.dtype == t…
-
when we use fp8 data type , we found ffn gemm/atten prj support real fp8 comute(this is supported on H20、L20), but Q*transopse(Key) or softmax * value in attention dosen't support fp8 compute, …
-
Hello, I have recently implemented a cross attention application with multi-modal fusion, but because the image resolution is too large, cuda OOM occurs when calculating q and k, so I found your paper…
-
If document is open in a plank and stack attention jumps from the stack back to the standalone plank
https://github.com/user-attachments/assets/5f81904d-c0ea-46a3-89db-dd715143fd53