Closed vsuarezpaniagua closed 6 years ago
Hi, thanks for raising the issue :)
I think if you take a closer look at final_state
or the value we feed into the session through the feed_dict, it actually contains the state of all the layers at the last timestep. The "final" in final_state
means the last timestep, not the last layer.
So each time we run,
state = session.run(self.final_state, {self.input_data: x, self.initial_state: state})
the Char-RNN is unrolled and run for a number of timesteps, and the state of the Char-RNN (including all layers) after processing the last timestep is kept in state
and passed on.
Please let me know if this doesn't make sense.
In the training phase the _self.initialstate is used as _multi_cell.zerostate and _finalstate of the last layer is kept:
However, in the testing phase (_def sampleseq()) it seems that all the layers are fed just with the state of the last layer of the previous step, _self.finalstate, as:
If I'm not wrong I think all the states of each layer must be kept and then fed them in their corresponding layer for the following steps, not feeding the last one to all the layers.