Why maskZero work - Githubissues

OptimusPrimeCao commented 8 years ago

Hi,@nicholas-leonard In the last sentence, it is said that the hidden state is reset . However I can't find the implementation wrt this part in your code . Could u please point out it ? Thanks !

nicholas-leonard commented 8 years ago

@OptimusPrimeCao Basically, because MaskZero is applied to internal recurrentModule of the RNN for each time-step, then when https://github.com/Element-Research/rnn/blob/master/MaskZero.lua#L77 it zeros the outputs for each (step, row) that is zero in the input. So that effectively forgets by creating a discontinuity in between previous and next state. The simple principle is applied for the commensurate gradients with respect to input : https://github.com/Element-Research/rnn/blob/master/MaskZero.lua#L85.

Caveat: this does not mean that the outputs and gradInputs of the internal modules making up a recurrentModule will all by zeros as well. MaskZero only affects the immediate gradInput and output of the module that it encapsulates.

nicholas-leonard commented 8 years ago

@OptimusPrimeCao I update the doc: https://github.com/Element-Research/rnn#maskzero . Thanks for bringing this up!

Element-Research / rnn

Why maskZero work #318