Merlin has used incorrectly theano.scan function in gating.py. The bug has been confirmed by my local code.
Theano's scan function will loop through the first dimension. While the batch input has 3 dimensions (num_batches, num_timesteps, num_dimensions). A dimension transposing operation should be applied before scan to make the batch input has a (num_timesteps, num_batches, num_dimensions) shape. After scan function, you should transpose them back.
Merlin has used incorrectly
theano.scan
function in gating.py. The bug has been confirmed by my local code.Theano's
scan
function will loop through the first dimension. While the batch input has 3 dimensions(num_batches, num_timesteps, num_dimensions)
. A dimension transposing operation should be applied beforescan
to make the batch input has a(num_timesteps, num_batches, num_dimensions)
shape. Afterscan
function, you should transpose them back.Attention, I only checked gating.py file.