Resolve #376 - Githubissues

The previous code is unfortunately passed a test case. When all sequences are shorter than maximum length, at 1st time step, the first dimension size of self.noise is 1 in TrimZero algorithm. Then, (Lazy) Dropout's self.noise is copied across time steps, presumably, by this, as a result, it can avoid an error incorrect size: only supporting singleton expansion (size=1) since the first dimension size of self.noise is always equal to 1.

Note that since Bayesian GRU with TrimZero should use monotonic sampling (the same dropout samplings across a batch) for dropouts, the performance is the same if an error is not occurred due to the distribution of sequence lengths.

Element-Research / rnn

Resolve #376 #378