Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Resolve #376 #378

Closed jnhwkim closed 7 years ago

jnhwkim commented 7 years ago

The previous code is unfortunately passed a test case. When all sequences are shorter than maximum length, at 1st time step, the first dimension size of self.noise is 1 in TrimZero algorithm. Then, (Lazy) Dropout's self.noise is copied across time steps, presumably, by this, as a result, it can avoid an error incorrect size: only supporting singleton expansion (size=1) since the first dimension size of self.noise is always equal to 1.

Note that since Bayesian GRU with TrimZero should use monotonic sampling (the same dropout samplings across a batch) for dropouts, the performance is the same if an error is not occurred due to the distribution of sequence lengths.