farizrahman4u / recurrentshop

Framework for building complex recurrent neural networks with Keras
MIT License
767 stars 218 forks source link

Clarification question about readouts #85

Open tom-christie opened 6 years ago

tom-christie commented 6 years ago

I'm trying to build a custom RNN architecture and after banging my head against the Keras source code for a while I ended up here. recurrentshop is a super neat and helpful project and I think it will help me do what I'm wanting but I'm stuck.

I'm trying to build a network with the following architecture -

X - input H - hidden state Y - output t - time step

Xt --> Ht is defined by a weight matrix Wxh - this is 'kernel' in the SimpleRNNCell H_tm1 --> Ht is defined by a weight matrix Whh - this is 'recurrent_kernel' in the SimpleRNNCell Ht --> Yt would be defined by a second layer and matrix Why, since I want it to be a secondary transformation and convert the hidden state dimension to a 1-dimensional output at each time step Y_tm1 --> Ht is the hard part, defined by a matrix Wyh.

If I understand correctly the architecture is somewhat similar to your readout example. However I'd like to incorporate Y_tm1 into the state by treating it as a '1st class' input like so:

Ht = K.dot(Xt, Wxh) + K.dot(H_tm1, Whh) + K.dot(Y_tm1, Wyh) Ht = tanh(Ht)

The readout example showed how to add or multiply X by the previous output Y, but I'd like to also learn the Wyh matrix. I think that means I need to include a new Dense() layer somewhere, but I am having a hard time figuring out how to do that. I am using this document as a start. I'd appreciate any help you could give! For reference, I tried re-writing the SimpleRNNCell class to include a two-part state (one for Ht and one for 'hidden' inside the cell) and ended up with a cryptic Keras error that I didn't understand.