Closed agethen closed 8 years ago
Narrowing down the problem: It seems the bug occurs because of the implicit Split layer, that is spawned to clone input blob "x" for the four convolutions "input"/"forget"/"output"/"gate". Adding an additional 1x1 convolutional layer before the implicit Split layer fixes the issue.
Choices on how to proceed:
Issue fixed by replacing separate convolutional layers for three gates + gate activation by a single one with four times the channels.
A user reported that gradients to underlying layers appear to be all zero. Two possible reasons may be: