asmekal / keras-monotonic-attention

seq2seq attention in keras
GNU Affero General Public License v3.0
40 stars 7 forks source link

Output Data Format #8

Closed songeater closed 5 years ago

songeater commented 6 years ago

Hi - it seems that the original paper and this implementation addresses a target output that is one-hot encoded. To have this work with targets / y-values that are real numbers (I use lstms to experiment with non-quantized audio), would I just have to change the softmax activation in the yt calculation of the step() function below? Eg. change the activation to sigmoid or tanh?

        yt = activations.softmax(
            K.dot(ytm, self.W_o)
            + K.dot(st, self.U_o)
            + K.dot(context, self.C_o)
            + self.b_o)

Or does the attention concept, as described in the paper, not work with real-valued targets/outputs? I know that lstm models typically function best with quantized data/ one-hot vectors quashed with a softmax function... but real output is what i am playing with. This is neat work... thanks!

asmekal commented 6 years ago

The attention concept can be used for real-valued targets as well, despite implementation in this repo does not allow to do that. If you want to, just changing softmax to another activation may not be enough - better look carefully on all places where yt, ytm, y0 used (in step and get_initial_state most likely) and change them appropriately (in current implementation there is argmax and than embeddings for one-hot vectors are used).

songeater commented 6 years ago

Thank you - yes the argmax/one-hot embeddings definitely have to be handled. Will post my modifications - if i am able to make them!