-
Can you tell me how the attention mechanism is applied? I have been looking at your source code for a long time and can't see whether the attention mechanism is applied to the generator or discriminat…
-
Thanks for your git, which gives me a lot of inspiration. To my best knowledge, the attention or pointer mechanism is popular in sequence to sequence tasks such as chatbot. I have read the attention m…
-
# 어텐션 메커니즘 (Attention Mechanism) : Seq2Seq 모델에서 Transformer 모델로 가기까지 | Reinventing the Wheel
[https://heekangpark.github.io/nlp/attention](https://heekangpark.github.io/nlp/attention)
-
{
"base_config": "configs/HighwayEnv/agents/DQNAgent/ddqn.json",
"model": {
"type": "EgoAttentionNetwork",
"embedding_layer": {
"type": "MultiLayerPerceptron",…
-
# 🚀 Feature request
I've looked into the paper titled "[EL-Attention: Memory Efficient Lossless Attention for Generation](https://arxiv.org/abs/2105.04779)".
It proposes a method for calculating att…
-
Hi,
First of all, great work. I am big proponent of FLan-t5 and use it in my projects. For multilingual, mT5 and bigscience/mt0 models provide a good baseline and are truly multilingual. Does Flash…
-
### Model description
Here is the model description
> gte-Qwen1.5-7B-instruct is the latest addition to the gte embedding family. This model has been engineered starting from the [Qwen1.5-7B](https:…
-
The python inference code provided seems the same as "normal" whisper. So where is the speedup coming from? Flash attention?
-
I have read your codes about split-attention and I found that you use ReLU before split-attention.
https://github.com/zhanghang1989/ResNeSt/blob/76debaa9b9444742599d104609b8ee984b207332/resnest/torch…
-
We know that flash attention supports `cu_seqlens`, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational eff…