crazydonkey200 / tensorflow-char-rnn

Char-RNN implemented using TensorFlow.
MIT License
425 stars 267 forks source link

Feeding same _initial_state_ to all layers #19

Closed vsuarezpaniagua closed 6 years ago

vsuarezpaniagua commented 6 years ago

In the training phase the _self.initialstate is used as _multi_cell.zerostate and _finalstate of the last layer is kept:

self.initial_state = create_tuple_placeholders_with_default(multi_cell.zero_state(batch_size, tf.float32), extra_dims=(None,), shape=multi_cell.state_size)
outputs, final_state = tf.contrib.rnn.static_rnn(multi_cell, sliced_inputs, initial_state=self.initial_state)
self.final_state = final_state

However, in the testing phase (_def sampleseq()) it seems that all the layers are fed just with the state of the last layer of the previous step, _self.finalstate, as:

state = session.run(self.final_state, {self.input_data: x, self.initial_state: state})

If I'm not wrong I think all the states of each layer must be kept and then fed them in their corresponding layer for the following steps, not feeding the last one to all the layers.

crazydonkey200 commented 6 years ago

Hi, thanks for raising the issue :)

I think if you take a closer look at final_state or the value we feed into the session through the feed_dict, it actually contains the state of all the layers at the last timestep. The "final" in final_state means the last timestep, not the last layer.

So each time we run,

state = session.run(self.final_state, {self.input_data: x, self.initial_state: state})

the Char-RNN is unrolled and run for a number of timesteps, and the state of the Char-RNN (including all layers) after processing the last timestep is kept in state and passed on.

Please let me know if this doesn't make sense.