codekansas / keras-language-modeling

:book: Some language modeling tools for Keras
https://codekansas.github.io/language
MIT License
658 stars 170 forks source link

Example script for AttentionLSTM #36

Open Spider101 opened 7 years ago

Spider101 commented 7 years ago

I am having a bit of trouble understanding how to incorporate the AttentionLSTM layer into my code. In your blog you have said that "The attentional component can be tacked onto the LSTM code that already exists.". But unlike a standard LSTM, this custom layer requires a second parameter which is the attention vector. As such, I tried the following code to build my model

seq_len, input_dims, output_dims = 200, 4096, 512
input_seq = Input(shape=(seq_len, input_dims,), dtype='float32')
attn = AttentionLSTM(output_dims, input_seq)(input_seq)  
model = Model(input=input_seq, output=attn)

However I get the following error: ValueError: Dimensions 4096 and 200 are not compatible.

My main trouble is understanding what should be the attention vector that should be passed according to your class specification. I know, conceptually, from the Show, Attend and Tell paper, that the attention vector should be each of the 1x4096 vectors. But I can't figure out how to pass that into the AttentionLSTM layer.

It would be very helpful if you could provide a gist or example script to demonstrate how to use the AttentionLSTM layer just like you did with the different rnns in your blog post!

xymtxwd commented 7 years ago

Hi, did you find any solutions to that? I met the same problem and was hoping to get help as well.

Spider101 commented 7 years ago

Not really. I switched to Pytorch to get what I needed to be done, much more flexibility there. Sorry, I wish I could have been of more help!