-
Hello Author:
In the CrossAttention class in the utils.py file, there is only one input parameter x, which actually computes Self-Attention. Is your code inconsistent with the content of your paper?
-
All flux models, dev, schnell and fp8 versions, report this error during conversion, whether use Dynamic or Static conversion:
File "K:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\…
-
Thanks for the incredibly clean repository!
I am Sayak from the [Diffusers](https://github.com/huggingface/diffusers) team at Hugging Face. My question is probably very naive, so I apologize for th…
-
In Hugging Face "eager" Mistral implementation, a sliding window of size 2048 will mask 2049 tokens. This is also true for flash attention. In the current vLLM implementation a window of 2048 will mas…
caiom updated
1 month ago
-
I have been informed that while Flash Attention's there it's not being used -
https://github.com/oobabooga/text-generation-webui/issues/3759#issuecomment-2031180332
The post has a link to what has …
-
### 🐛 Describe the bug
I am running a FlexAttention operation and it returns different output shapes with and without compile. The correct output shapes are those returned without compile.
```P…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…
-
**Describe the bug**
I try to use LLaVA example and faced to key mismatch error. I am on latest commit in main branch. (094d66b)
[rank0]: RuntimeError: Error(s) in loading state_dict for LLaVAMode…
-
Hi @cubiq,
Since the `SD3 Attention Seeker L/G` node adjusts Clip L and Clip G, does that mean it could also work with SDXL?
I tried it and it does something, but I don't know if it's working p…
-
Thank you for this amazing work!
I was wondering if the fp8 implementation of flash attention 3 will be able for public to use? My main concern will be accuracy (block quant may have alleviated thi…