-
作者您好,关于论文中的Equation(4)我有一些疑惑:
![](https://latex.codecogs.com/svg.image?\mathrm{S}_{\mathrm{sim}}(p)=\underset{i}{softmax}(\{\langle&space;F_{ref}^{1/8}(p),F_{s&space;r&space;c}^{1/8}(p_i)\rangle\}_{p…
-
Thank you for the great work on FA3. I am wondering if FA3 will support sliding window attention soon as FA2 does?
-
### System Info
Hi!
I'm running speculative execution TRT-LLM engine with 4 or 5 generation length, and I noticed that fp8 kv cache attention works slower than fp16 kv cache attention. Would be grea…
-
Hi,
Im looking around Differential Transformer paper and code,
I found that github version is based on flash attention and rotary embedding.
I wonder that is there any plan to upload simple example …
-
I got a small freelancer deal from [Kdot UK (aka Waterlily Labs](https://www.waterlilylabs.com/) early this year.
The whole R&D team of this company is located in Sri Lanka. The company is so poor…
-
I saw flash attention was recently merged.
This approximate attention would be cool to have as well for training very large sequence lengths. https://github.com/HazyResearch/flash-attention/blob/m…
-
Hi,
thanks a lot for providing SAMAPI extension for QuPath!
While the extension does work, the segmentation takes a lot of time.
I have noticed the following user warning:
`UserWarning: Flash …
-
Config:
Windows 10 with RTX4090
All requirements incl. flash-attn build - done!
Server:
```
(venv) D:\PythonProjects\hertz-dev>python inference_server.py
Using device: cuda
Loaded tokeniz…
-
Hi, I installed Flash-Attention-3 from source and run test_flash_attn.py. I found that 5946 UT pass and 6 UT fail. Would you please help me solve these problems?
My env:
+ source code commit id:…
-
Hi @zhuzilin, follow up from https://github.com/zhuzilin/ring-flash-attention/issues/15
I just wanted to verify the causal, and I simply use loop because I dont have multigpus, but it should be wor…