automan000 / Convolutional_LSTM_PyTorch

Multi-layer convolutional LSTM with Pytorch
800 stars 198 forks source link

Peephole connections (Wci, Wcf, Wco) gradient update #20

Open Pozimek opened 5 years ago

Pozimek commented 5 years ago

The LSTM paper defines a specific rule for gradient updates of the 'peephole' connections. Specifically:

[...] during learning no error signals are propagated back from gates via peephole connections to CEC

Based on my understanding of the code the way these 3 variables are initialized (as asked in Issue 17) is an attempt at implementing this update rule, but I don't see how does initializing them as Variables helps. From my understanding of the quoted part of the LSTM paper, the peephole connections should be updated but the gradient that updates them should stop there and not flow any further. If that is the case then this implementation is incorrect, although it might be that Pytorch does not support such an operation as .detach() is not suitable for the job.

Pozimek commented 5 years ago

I've come to think that changing L33, L34 and L36 to use c.detach() should fix this issue, but I'm not very confident about this.

ci = torch.sigmoid(self.Wxi(x) + self.Whi(h) + c.detach() * self.Wci)

IMO gradient should flow through c only via the operations in L35 and L37.