-
Would it be possible to add functionality for **Grad-CAM** or **attention map** similar to those used in DINO?
Thank you!
-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) …
-
### Feature request
Hi, in Medusa's paper, they adopt tree-attention and use typical sampling to increase the speedups, but in current code bases, I think it only use argmax() and no tree-attention. …
-
-
Hello, I have recently implemented a cross attention application with multi-modal fusion, but because the image resolution is too large, cuda OOM occurs when calculating q and k, so I found your paper…
-
Will there be 3090 support on Flash Attention 3 in the future?
-
"We do not have so many requests, actually.
We also have some internal discussions, but there are a lot of alternatives for the faster (lightweight) encoder and Squeezeformer does not come to a hig…
-
Maybe it is to niche, but we would be interested in a fused circular windowed attention.
Currently, we pad q,k,v, use the fused kernel and crop.
It would either help if k,v could have different di…
-
hi, I have attention_mask problem mismatch in the cross attenstion
can you please explain this line:
requires_attention_mask = "encoder_outputs" not in model_kwargs ?
why is comed after this:
…
-