Masking operation errornous

Hi @JonathanRaiman , first off thanks for setting up this repo. Learned a lot from your implementation.

I have a question regarding the masking operation for variable length inputs. You mentioned in docs that

Elementwise mutliply this mask with your output, and then apply your objective function to this masked output. The error will be obtained everywhere, but will be zero in areas that were masked, yielding the correct error function.

But by doing so you only cuts off the back-propagation (BP) paths from masked outputs, whereas the BP paths from unmasked outputs via masked hidden units remain. The resulting loss function is still errornous.

I notice from other LSTM implementations (e.g. Lasagne) that the usual approach is to use the mask to switch between previous hidden states (if current input is masked) and computed hidden states (otherwise). In this way, both type of unwanted BP paths (masked outputs -> masked hidden -> weight & unmasked outputs -> masked hidden -> weight.)

Plz correct me if I'm wrong. Am I missing something?

JonathanRaiman / theano_lstm

Masking operation errornous #22