-
### Search before asking
- [x] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### YOLOv8 Component
Integrations, Other
###…
-
Attention Markers can be incorrect at times because the prediction is not simulated out the same way it actually occurs in real time. Things like PlanningView and OnDrawSelected tend to find targets b…
-
# 🚀 Feature
Support Flash Attention 3
## Motivation
Flash Attention 3 has been proved to greatly accelerate Flash Attention 2 on H100.
## Pitch
Offer Flash Attention 3 support
-
### Description of the bug
Not really a bug. But the scrolling in the layer view used to be a lot better.
I am not sure if this issue is particular to diferent mice models, but it used to be that fo…
-
Opening this issue so we don't forget: Once #1545 is merged, let's also add sliding window attention to Mistral 0.1
rasbt updated
2 weeks ago
-
Currently, the functions exist in the `_attention.py` file but are not *explicitly* exported. But a lot of people want to write their own custom MHA implementation and could use these functions.
(I…
-
### 🚀 Feature
Add more options for choosing attention implementation:
- Auto/None
- Eager
- SDPA
- FA2
-
What is the position of the attention module added in the network when you conduct the experiment?
-
I reviewed the code of modeling_qwen.py, and I noticed that, within the lookahead process, the draft_ids matched from the TrieTree are such that the attention_mask and position ids associated with the…
-
In our paper we only showed results on causal language models, which use causally masked (decoder) self-attention.
If you'd like to use ALiBi for seq2seq tasks such as translation, speech or T5, o…