-
### ⚠️ Please check that this feature request hasn't been suggested before.
- [X] I searched previous [Ideas in Discussions](https://github.com/axolotl-ai-cloud/axolotl/discussions/categories/ideas) …
-
When I read the code in your nice_stand.py file, I didn't see you using self-attention or graph attention mechanisms, but you describe this part in your paper
![图片1](https://github.com/eeyhsong/NICE-…
-
Will there be 3090 support on Flash Attention 3 in the future?
-
I have a conda env where "FA2=True" and another env where "FA2=False" (as dispayed in the terminal when run the finetuning script), the vRAM usuable of tuning the same Gemma 2 model (2b or 9b) are the…
-
Hello, I would like to ask, what are the attention_cuda.py and attention_native.py files in the classification folder and are they modules? I would be very grateful if your team could answer.
-
### Description
Perhaps I am using this function incorrectly, but I get data leaks when using `key_value_seq_lengths`. It appears as though both the `xla` and `cudnn` implementations in jax nightly…
-
```
*** Error loading script: attention.py
Traceback (most recent call last):
File "C:\Users\ZeroCool22\Desktop\webui_forge\webui\modules\scripts.py", line 525, in load_scripts
s…
-
Where and how will you find the people who will download the app. And how will you get their attention in this very short attention span generation.
-
This issue is not in response to a performance regression.
The method of performing cross-attention QKV computations introduced in #4942 could be improved. Because this issue relates to cross-atten…
-
论文提到采用了把节点分组的方式,理乱上减少了计算的复杂度,请问在代码中计算空间注意力这一块儿,哪里体现了分组计算呢?