Closed LeonardKnuth closed 8 years ago
@LeonardKnuth The backward will not work with ConcatTable as AbstractRecurrent instances require these to be called in reverse order of forward. Basically, don't use ConcatTable :)
@nicholas-leonard Thanks for your explanation. However, in my above program, each ConcatTable only contains one LSTM, and then use Sequential to connect all the ConcatTable. In that case, the order of forward and backward should be as same as that of AbstractRecurrent, is it right? Thanks.
@LeonardKnuth Sorry I hadn't caught that. Ok then the order of forward/backward should be the same. I think then the problem is caused by the updateParameters. In the Concat implementation calling parameters() will return the LSTM's params and gradParams 3 times (once for each LSTM in the network). So the call to updateParameters will effectively add those gradients 3 times instead of just 1. You can alleviate this by using getParameters to obtain a consolidated tensor of params and gradParams which shouldn't contain duplicates. To update you call params:add(-lr, gradParams)
.
@nicholas-leonard Thank you very much for your detailed explanation. One more thing, is there any difference in nature between adding gradients 3 times and once? Can we adjust the learning rate (e.g., 0.0001 for the concat method and 0.0003 for the sequencer method.) to make these two methods consistent? Thanks a lot.
@LeonardKnuth That trick will only work if the LSTM parameters are the only parameters return by a call to parameters()
. This seems to be the case for your so year that could make them consistent.
Hi everyone,
I try to build a recurrent neural networks using nn.LSTM in the following two methods: one uses nn.Sequencer, and the other uses nn.ConcatTable. Although I fixed the input table and parameters of the two nets, their outputs and errs are different. I am confusing if there is a difference between this two implementations? If so, what is the difference? Thanks a lot.
The two different implementations:
The corresponding outputs:
After several trials, here is the conclusion and guess:
The forward of the two implementations are same when the sequencer method turns on remember('both'), but it seems that they have a slight different backwards. Is there any randomness in the backward? or they use the totally different backward methods?
Thanks.