-
### 🚀 The feature, motivation and pitch
Flash Attention 3 (https://github.com/Dao-AILab/flash-attention) has been in beta for some time. I tested it on H100 GPUs with CUDA 12.3 and also attempted a…
-
With https://github.com/tenstorrent/tt-metal/pull/12309, causal SDPA no longer accepts an attention mask. It instead generates its own causal mask. The PR only removed the attention mask from calls to…
-
Is there any way the Flash Attention 2 support for this model? if there is a way to do it i would love to get involved and help out!
I've tried implement by looking at [MusicGen's one ](https://git…
-
### 🐛 Describe the bug
When I use flex attention on one RTX 4090, I got some error.
A minimal repro:
```python
import torch
from torch.nn.attention.flex_attention import flex_attention
flex_at…
-
Attention Markers can be incorrect at times because the prediction is not simulated out the same way it actually occurs in real time. Things like PlanningView and OnDrawSelected tend to find targets b…
-
What is the position of the attention module added in the network when you conduct the experiment?
-
# 🚀 Feature
Support Flash Attention 3
## Motivation
Flash Attention 3 has been proved to greatly accelerate Flash Attention 2 on H100.
## Pitch
Offer Flash Attention 3 support
-
### The model to consider.
https://huggingface.co/dunzhang/stella_en_1.5B_v5
last_hidden_state = model(**input_data)[0]
in __init__ model:
vector_linear = torch.nn.Linear(in_features=model.conf…
-
Great job! I wonder if you could kindly open-source the code for visualizing the attention map? Looking forward to your response! Thanks so much!
-
### 🚀 The feature, motivation and pitch
I am working on adjustment of radix attentions now. Thank you for your support for the radix attention. Currently, catching for A that allows for more efficien…