-
[https://arxiv.org/pdf/1608.05745v3.pdf](https://arxiv.org/pdf/1608.05745v3.pdf)
> Accuracy and interpretation are two goals of any successful predictive models. Most existing works have to suffer …
-
# speech recognition
- Soltau, Hagen, Hank Liao, and Hasim Sak. "Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition." arXiv preprint arXiv:1610.09975 (201…
-
Hi, I'm trying to implement the Deep Recurrent Attention Model described in the paper http://arxiv.org/pdf/1412.7755v2.pdf to apply to image caption generation instead of image classification. I will …
-
**Abstract:**
> The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also conn…
-
This might be a Keras problem but have you tried serializing some of the layers? I tried the following to save a model that contains `SoftAttention`:
```
from keras.engine import Input, Model
…
-
I'm currently writing a recurrent reinforcement library, with LSTMs, linear attention, etc that I would like to add S4 to.
Unfortunately, I find S4D unable to learn in even simple RL tasks (e.g. outp…
-
-
I'm a little confused of what retnet does in practice. Because in the formula ` Rentention(X) = (Q @ K.T * D) @ V`, if the *decay* is 1, the mathematical derivation of proving the equivalence between …
-
你好,我最近看了您的代码,关于Attention layer那一块,我不是很懂,主要是输入的shape以及中间各层的shape,还有每个变量的意义。读完还没有弄明白mtl的任务是什么,就是代码解决什么领域的问题。能否写点文档介绍一下呢,谢谢您。
-
Hello Professor!
I have a question about this subsection: compare the performance against the naive LSTM approach. Is there any specific architecture that I need to compare my target solution with?…