attention-mechanism Search Results

1000+ results
for attention-mechanism

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Topaz1618/CycleganSA #2

how attention part apply

Can you tell me how the attention mechanism is applied? I have been looking at your source code for a long time and can't see whether the attention mechanism is applied to the generator or discriminat…

ats4869 updated 1 year ago
1
oswaldoludwig/Seq2seq-Chatbot-for-Keras #13

Would you please add the attention or pointer mechanism base…

Thanks for your git, which gives me a lot of inspiration. To my best knowledge, the attention or pointer mechanism is popular in sequence to sequence tasks such as chatbot. I have read the attention m…

Imorton-zd updated 6 years ago
2
HeekangPark/comments.heekangpark.github.io #23

nlp/attention

# 어텐션 메커니즘 (Attention Mechanism) : Seq2Seq 모델에서 Transformer 모델로 가기까지 | Reinventing the Wheel [https://heekangpark.github.io/nlp/attention](https://heekangpark.github.io/nlp/attention)

utterances-bot updated 6 months ago
2
eleurent/rl-agents #113

Algorithmic issues

{ "base_config": "configs/HighwayEnv/agents/DQNAgent/ddqn.json", "model": { "type": "EgoAttentionNetwork", "embedding_layer": { "type": "MultiLayerPerceptron",…

AHPUymhd updated 3 months ago
3
huggingface/transformers #12793

Feature Request: El-Attention

# 🚀 Feature request I've looked into the paper titled "[EL-Attention: Memory Efficient Lossless Attention for Generation](https://arxiv.org/abs/2105.04779)". It proposes a method for calculating att…

suriyakode updated 10 months ago
1
Knowledgator/TurboT5 #4

Does this work for mt5 architectiture?

Hi, First of all, great work. I am big proponent of FLan-t5 and use it in my projects. For multilingual, mT5 and bigscience/mt0 models provide a good baseline and are truly multilingual. Does Flash…

hrsmanian updated 3 weeks ago
7
huggingface/text-embeddings-inference #261

Support gte-Qwen1.5-7B-instruct

### Model description Here is the model description > gte-Qwen1.5-7B-instruct is the latest addition to the gte embedding family. This model has been engineered starting from the [Qwen1.5-7B](https:…

reverland updated 3 weeks ago
1
Vaibhavs10/insanely-fast-whisper #216

python inference code not different from "normal" whipser?

The python inference code provided seems the same as "normal" whisper. So where is the speedup coming from? Flash attention?

wincing2 updated 3 weeks ago
1
zhanghang1989/ResNeSt #53

Why using ReLU before split-attention?

I have read your codes about split-attention and I found that you use ReLU before split-attention. https://github.com/zhanghang1989/ResNeSt/blob/76debaa9b9444742599d104609b8ee984b207332/resnest/torch…

Andy1621 updated 4 years ago
1
state-spaces/mamba #180

Question about does mamba support variable-length input or c…

We know that flash attention supports `cu_seqlens`, which can remove padding for variable-length input in a batch and only store regular tokens. This can be useful for optimizing the computational eff…

zigzagcai updated 4 months ago
12

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for attention-mechanism

1000+ results
for attention-mechanism