-
### Proposal to improve performance
My gpu is tool old so that can't install flash_attn package.
So, I want use vllm.attention.ops.triton_flash_attention replace flash_attn package
### Report of pe…
-
*** Support data submitted ***
Id: "902e83eb-d5bc-4469-8adb-b59308ba0655"
-
Dear Author,
Thank you for open-sourcing such a great piece of work. Could you please elaborate on the extent to which flash attention can bring speed and memory efficiency improvements to PTv3?
…
-
This should be able to work with just a powerful color tracking marker.
-
I'm trying to run inference with flash attention and I'm getting this error.
```
from flash_attn import flash_attn_func
import torch
def main():
batch_size = 8
seqlen_q = 1
seq…
-
hi
i have follow all step , but still cannot zip file on soc mode
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports.
…
-
cuda: 11.7
torch: 2.0.1
python: 3.10.9
release: flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
```
File "/home/.conda/envs/venv310/lib/python3.10/site-packages/transfo…
-
ValueError: You are attempting to perform batched generation with padding_side='right' this may lead to unexpected behaviour for Flash Attention version of Qwen2. Make sure to call `tokenizer.padding…
-
See repro here:
https://stackblitz.com/edit/vitejs-vite-eafhmw?file=src%2Fmain.tsx
And the recording:
https://github.com/pmndrs/uikit/assets/9379701/81269900-f3d8-433c-92d5-a8828a915bf6
To rep…
Ledzz updated
2 weeks ago