-
Hello,
Please let me know how do I run Moondream2 using Flash Attention 1 since am trying to run it on kaggle or colab using t4 gpus so flash attention 2 won't work.
You have just mentioned to use f…
-
I think having flash attention in `equinox` should be a critical issue considering this is already natively built-in torch.
While XLA is supposed to (in theory) do some of the fusion, and possibly …
-
With release 1.3.2 a backwards breaking change was necessary to comply with the latest release of LibLO 0.32 (released 02/16/24). LibLO 0.32 has a breaking change that now requires all OSC paths to ha…
-
### Search before asking
- [x] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### YOLOv8 Component
Integrations, Other
###…
-
Flash Attention can only be used with fp16 and bf16, not with fp32. Therefore, we should make flash attention optional in our codebase so that one can deactivate it during inference in exchange for hi…
-
-
Hello, I would like to ask, what are the attention_cuda.py and attention_native.py files in the classification folder and are they modules? I would be very grateful if your team could answer.
-
Dear authors,
Thanks for sharing the code, I want to know how to generate attention maps.
-
I usually train models using instances on Vast.ai. my proess did not change I am used to instantiate instances with Torch 2.1.1 and/or 2.2.0 and CUDA 12.1. I am using an RTX 3090
As always i run in…
-
### 🚀 Feature
Add more options for choosing attention implementation:
- Auto/None
- Eager
- SDPA
- FA2