Incorrect gradient backpropagation?

agethen commented 8 years ago

A user reported that gradients to underlying layers appear to be all zero. Two possible reasons may be:

are the convolutional layers inside ConvLSTM sharing their gradients correctly? (backpropagation enabled)
Is there a general bug in recurrent_layer.cpp (which shares data and gradients with the outside network)

agethen commented 8 years ago

Narrowing down the problem: It seems the bug occurs because of the implicit Split layer, that is spawned to clone input blob "x" for the four convolutions "input"/"forget"/"output"/"gate". Adding an additional 1x1 convolutional layer before the implicit Split layer fixes the issue.

Choices on how to proceed:

Provide our own recurrent_layer, and force the user to provide four input blobs, thereby avoiding split
Use 1x1 conv. layer (however, this modifies the formulation)
Fix original issue, possibly by improving split_layer ? (hard)

agethen commented 8 years ago

Issue fixed by replacing separate convolutional layers for three gates + gate activation by a single one with four times the channels.

agethen / ConvLSTM-for-Caffe

Incorrect gradient backpropagation? #2