flomlo / ntm_keras

An implementation of the Neural Turing Machine as a keras recurrent layer.
BSD 3-Clause "New" or "Revised" License
139 stars 30 forks source link

I cannot load saved model #5

Open ylmeng opened 7 years ago

ylmeng commented 7 years ago

I implemented the NeuralTuringMachine layer in my model and trained it. It works fine but when I tried to load the saved model, I got "Unknown layer NeuralTuringMachine". I have tried different ways to load the model, including model_from_json and load_model. All the same problem. Then I used the parameter custom_objects model=model_from_json(open(model_destination + ".arch.json").read(), custom_objects={"NeuralTuringMachine":NeuralTuringMachine}) But got a error: `/home/ymeng/anaconda2/lib/python2.7/site-packages/keras/engine/topology.pyc in from_config(cls, config) 1250 A layer instance. 1251 """ -> 1252 return cls(**config) 1253 1254 def count_params(self):

TypeError: init() takes at least 2 arguments (2 given)`

However I can use load_weights if I load an untrained model first (basically create a new model each time).

I suppose it is just a configuration problem and can be fixed. However I wonder if you have ever tried loading a saved model? The examples only train models but never load them.

flomlo commented 7 years ago

Ah yes, the loading/saving problem. Admittedly I havent done that so far via the keras load/save functions.

I currently do not understand completely how that works in keras. I have the suspicion that it might work better if ntm.py would have code like this (copied from recurrent.py in keras):

def get_config(self):
    config = {'units': self.units,
              'activation': activations.serialize(self.activation),
              'recurrent_activation': activations.serialize(self.recurrent_activation),
              'use_bias': self.use_bias,
              'kernel_initializer': initializers.serialize(self.kernel_initializer),
              'recurrent_initializer': initializers.serialize(self.recurrent_initializer),
              'bias_initializer': initializers.serialize(self.bias_initializer),
              'unit_forget_bias': self.unit_forget_bias,
              'kernel_regularizer': regularizers.serialize(self.kernel_regularizer),
              'recurrent_regularizer': regularizers.serialize(self.recurrent_regularizer),
              'bias_regularizer': regularizers.serialize(self.bias_regularizer),
              'activity_regularizer': regularizers.serialize(self.activity_regularizer),
              'kernel_constraint': constraints.serialize(self.kernel_constraint),
              'recurrent_constraint': constraints.serialize(self.recurrent_constraint),
              'bias_constraint': constraints.serialize(self.bias_constraint),
              'dropout': self.dropout,
              'recurrent_dropout': self.recurrent_dropout,
              'implementation': self.implementation}
    base_config = super(LSTM, self).get_config()
    del base_config['cell']
    return dict(list(base_config.items()) + list(config.items()))

@classmethod
def from_config(cls, config):
    if 'implementation' in config and config['implementation'] == 0:
        config['implementation'] = 1
    return cls(**config)

I cannot guarantee it, however. If you think its in your scope to write the analogue for ntm.py I would gladly accept the pull request. Otherwise I will do it myself, but not sooner than thursday I fear.

Another immediate workaround would be to save the model, and then do "manual" loading: Load only the weights:

    m.load_weights("model.ckpt")
    k, b = m.get_weights()

Then build the model and place the weights. This is how I've done it so far the single instance I used model loading :)

ylmeng commented 7 years ago

Thanks. It is not a major issue, just a little inelegant. BTW I have some other questions about the implementation. (1) For each batch, does the code clear the memory that NTM reads and writes to? In my task, each batch is corresponding to a separate document, so I want the memory to be cleared. (2) I see batch_size is passed in as a parameter. Is it possible to let it unspecified? I mean if the LSTM controller is not stateful, or the controller is not an RNN, it should be OK to have varied batch sizes each time.

flomlo commented 7 years ago

1) Yes. In some future update other behaviour might be implemented via stateful=True, but right now not a priority.

2) Unfortunately, right now it is necessary in order to initialize previous attention vectors. I need to know the size during build time. But I've learned a lot about keras the last few weeks, might be that a revisitd of the code turns up a solution.

ylmeng commented 7 years ago

Thanks. Just one more question: Why do you make the LSTM controller stateful=True by default? I mean what is the benefit of having a stateful LSTM here. If I understand it correctly, a stateful LSTM will apply the state corresponding to input i of batch n to input i of batch n+1. In most cases, this would not make sense? (Unless every batch contains the same types of data, in order)

flomlo commented 7 years ago

The original idea was to make the controller stateful, and then apply sequences of length 1 (a logical step on the level of the NTM) on the inner controller. I however missed the detail that keras does not provide training over several steps.

To sum it up, stateful controllers are currently broken. In some future update the state of the controller will be saved manually, and then carried by the ntm-layer between steps.

But for that I still need to finalize a Keras patch which fixes behaviour of recurrent layers for sequence_length=1, it currently crashes out of the most dubious reasons.

ylmeng commented 7 years ago

I see. Since NMT is inherited from Recurrent object, it probably supports multiple steps by default. When the controller is an LSTM, it reads and writes to M in every time step, correct? If it a Dense controller,then is it not timedistributed by default (in all time steps inputs are connected to the Dense layer by the same set of weights)? I have data of shape (batch_size, timesteps, data_dim). For my task actually I don't want the controller to access memory in every time step. Instead, I want every time series (not every item at each time) to be encoded and written to memory. Should I use an external LSTM to process data into (batch_size, 1, data_dim), and then convert it to (1, batch_size, data_dim)? Then the batch_size is equivalent to the time steps in normal applications and can be fed in NTM? Of course there could be two NTMs too. One for timesteps, and the other for each encoded time series. I have not implemented it. Your opinion will be appreciated. Thanks for your code again. It saved me time.

flomlo commented 7 years ago

Im not completely sure I understand you correctly.

It may clear things up that from the outside perspective you're perfectly fine to feed (bs, sequence_length, input_dim) data to the NTM-layer. That the controller in the inside only sees sequence_length-many sequences of length 1 (called steps) is an implementation detail, you dont have to worry about it (I worry about that enought myself ;) ). In case of a stateless controller it is indeed the same controller for each step (like timedistributed with memory access), with a stateful controller (like LSTM, or an inner NTM if we're in the mood) we have to carry the controllers state around during different steps and force Keras/Tensorflow to not evaluate everything in parallel (Im currently fighting with that).

If you really do not want the controller to access the memory each step (did I understand that correctly?), then I fear the NTM is not the perfect tool for you. Memory access every timestep is an essential idea to the NTM.

You may of course use the codebase at your discretion in your own project If you want.

ylmeng commented 7 years ago

In fact I have two kinds of time steps, a larger scope and a smaller scope. As a simplified version, you can think of (documents, sentences_in_doc, words_in_sentences, word_vectors). Since a batch is always corresponding to one document, that can be ignored. What I want to do is to use an external LSTM to process words in sentences, to obtain (documents, sentences_in_doc, sentence_representations). Then use NTM to process the sentences_in_doc. The idea is that the sentences in a doc are a loose time series. So NTM may be a good tool because it has memory.

One thing I am not completely clear about is, if the LSTM controller does not return a sequence (return_sequences=False), but NTM is set as return_sequences=False, what does the NTM return at every time step? In your code I saw if self.controller.output_shape == 3: controller_output = controller_output[:,0,:] What does it do?

flomlo commented 6 years ago

Sorry for the delay.

If return_sequences=False is set, the NTM calculates everything like it would normally, but only the last value in the sequence is returned.

This behaviour is inherited by the recurrent class defined in keras' recurrent.py.

The line of code you posted: That is a (very crappy) way to discern between stateful controllers (output with timesteps) and dense controllers (output_shape == 2). Will be reworked sometime in the future.

ylmeng commented 6 years ago

I noticed you use two constants here:

init_old_ntm_output = K.ones((self.batch_size, self.output_dim), name="init_old_ntm_output")*0.42 
init_M = K.ones((self.batch_size, self.n_slots , self.m_depth), name='main_memory')*0.042

Where are the 0.42 and 0.042 from? Thanks