Compatibility issu with keras 3.2

jeammimi commented 8 years ago

Hi, I am using keras 3.2 and when using the ntm layer I had an error: 574 575 def step(self, x, states): --> 576 assert len(states) == 4 # 2 states and 2 constants 577 h_tm1 = states[0] 578 c_tm1 = states[1]

AssertionError: This is related to the fact that the states in the LSTM in the new version of keras is now a list with four components, the hidden state, the cell states and two "constants".

I made it work by changing your update controller function:

def _update_controller(self, inp, h_tm1, M):
    """We have to update the inner RNN inside the NTM, this
    is the function to do it. Pretty much copy+pasta from Keras
    """
    x = T.concatenate([inp, M], axis=-1)
    # update state
    if len(h_tm1) == 2:
        BW,BU = self.rnn.get_constants(x)
        h_tm1 += (BW,BU)

    _, h = self.rnn.step(x, h_tm1)

    return h

But is is probably dependant of the keras version and I am not sure when it changed.

Another question. I saw that in the tutorial you reach the accuracy of 97.

How long did you train it? Did you change the learning rate? Because right now I am kind of stuck to 90 % of accuracy and a loss of 0.35.

EderSantana commented 8 years ago

This implementation is really unstable after some recent changes in Keras. If it doesn't work for you with commit I mentioned in the notebook, then we got a problem. I'll probably have to review my code from scratch since a lot changed in Keras... In the meantime if you need a NTM I'd recommend you to take at look at https://github.com/snipsco/ntm-lasagne

In my next approach I'll probably just translate their code to keras

jeammimi commented 8 years ago

It is ok, with the modification that I mentionned for update_controler, it is working. The only strange thing is that the accuracy I get is around 90 % while you get around 98 %.

EderSantana commented 8 years ago

I mean, my accuracy was not problematic. When it converged at all it would just go up and up. with a total of 10000 updates I would be with >95% close to 98%

You made it work with new Keras? Cool! Would you consider making a PR so I can review it carefully? This way I could go straight to focusing in making the code faster.

jeammimi commented 8 years ago

Well sorry if I gave you false hope, because I just updated keras, and in the new LSTM, they are doing some preprocessing on the input that they move out of the step, which make it more difficult to adapt to your code, because for the neural turing machine the input change at each step.

In the paper from graves with neural turing machines they achieve the best results with feedforward controler. What do you thing if temporaly I just put a dense layer as the controler. Have you tried it ?

EderSantana commented 8 years ago

But it worked with the pip 0.3 version right? We could just make it a note for other people trying to just use it.

I tried with FF controllers, that wouldn't be hard to add to the code. Sometimes it doesn't converge for me... I don't know why. But check this mod with FF controllers: https://gist.github.com/EderSantana/fb66f36ab8577672ba3c

Let me know what you think

jeammimi commented 8 years ago

I just did the pull request. Apparently the guy from lasagne also have this divergence problem and they say that it is bad luck with the inital values: (Note: unlucky initialisation of the parameters might lead to a diverging solution witness by NaNs.)

EderSantana commented 8 years ago

that is not bad look, that is division by zero. NaN only appears if you initialize somebody with zeros, since we have a cosine divergence and we got vectors in the denominator. I think I told him that.

My problem is simply that the method does not learn to use its memory. I saw a presentation by Zaremba and he confirmed my intuitions. If the controller network is too strong it will simply overfit the training set. We have to make it slightly weak so it learns to rely on the memory and generalize to larger inputs. Right now I'm more inclined to believe there are better things than neural turing machines, and so does DeepMind given that they didn't try hard to follow up with it and instead proposed several alternative architectures.

Thanks for the pull request! I'll check it out and play with it. Btw, would you have time to look into "Learning to Transduce with unbounded Memory" http://arxiv.org/abs/1506.02516 It shouldn't require a lot of modification from the implementation we have on NTMs. And now that you have the intuition on how whole thing works maybe you could tackle this problem. I believe this method has a few architectural choices that makes it much easier to get working than NTM for some specific tasks. For example, instead of writing a weaker controller and hope that it will learn to use the memory, this other paper simply force the controller to ALWAYS read and get everything written before that.

jeammimi commented 8 years ago

Ok, I took a look at the Transduce article and indeed it looks that the architecture is similar at the turing machine. I will try to implement it next week.

EderSantana commented 8 years ago

I believe this one is solved right?

jeammimi commented 8 years ago

Yes. I am planning to make the library compatible with keras 0.3.3 but the internal of the lstm is changing quite fast. I am waiting for a version a little bit more stable .

2016-04-13 15:38 GMT+02:00 Eder Santana notifications@github.com:

I believe this one is solved right?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/issues/23#issuecomment-209444362

EderSantana commented 8 years ago

keras-1 was just released. Maybe you want to support that. Also, wait a little bit for a Theano and Tensorflow wrapper for cuDNN5. That one will come with optimized RNNs (LSTMs, GRU, vanilla)

jeammimi commented 8 years ago

I vaguely heard about that, do you ave more info about this optimization? I will take a look at keras-1.

2016-04-13 18:56 GMT+02:00 Eder Santana notifications@github.com:

keras-1 was just released. Maybe you want to support that. Also, wait a little bit for a Theano and Tensorflow wrapper for cuDNN5. That one will come with optimized RNNs (LSTMs, GRU, vanilla)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/EderSantana/seya/issues/23#issuecomment-209544668

EderSantana commented 8 years ago

yeah they wrote a really nice blog post about it https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/

EderSantana / seya

Compatibility issu with keras 3.2 #23