I'm currently working on a recurrent network which process the input, which is the output of the last step, and output result to be used at the next step. And, at every step, there is a loss to compute. But, using nn.Sequencer will wrap these time steps into one forward step and I haven't figured out how to link the input[t] with output[t-1].
I'm now using nn.Recurrent() to implement this but I don't know if it is right. The code looks like this:
--Model--r = nn.Recurrent(..., ..., nn.FastLSTM(..., ...), nStep)net = nn.Sequential():add(r):add(... some output module ...)-- Forward --for s = 1, nStep doo[s] = net:forward(o[s-1])end-- BPTT--for s = nStep, 1, -1 do...-- gradOutput of this step is the sum of gradInput from next step and grad from criterion of this stepgradOutput[s] = criterion:backward(o[s], target[s]) + gradInput[s + 1]gradInput[s] = network:backward(o[s-1], gradOutput[s])...
I'm wondering if my code is suitable for implementing my need or if I have missed something here?
And here is another question: Is it possible to use this library to implement a recurrent module that receive different size of input when t=1?
E.g. when t = 1, the input is a 2 x 10 tensor and when t>1, the inputs are 2 x 12 tensors.
I'm currently working on a recurrent network which process the input, which is the output of the last step, and output result to be used at the next step. And, at every step, there is a loss to compute. But, using nn.Sequencer will wrap these time steps into one forward step and I haven't figured out how to link the input[t] with output[t-1]. I'm now using nn.Recurrent() to implement this but I don't know if it is right. The code looks like this:
--Model--
r = nn.Recurrent(..., ..., nn.FastLSTM(..., ...), nStep)
net = nn.Sequential():add(r):add(... some output module ...)
-- Forward --
for s = 1, nStep do
o[s] = net:forward(o[s-1])
end
-- BPTT--
for s = nStep, 1, -1 do
...
-- gradOutput of this step is the sum of gradInput from next step and grad from criterion of this step
gradOutput[s] = criterion:backward(o[s], target[s]) + gradInput[s + 1]
gradInput[s] = network:backward(o[s-1], gradOutput[s])
...
I'm wondering if my code is suitable for implementing my need or if I have missed something here?
And here is another question: Is it possible to use this library to implement a recurrent module that receive different size of input when t=1? E.g. when t = 1, the input is a 2 x 10 tensor and when t>1, the inputs are 2 x 12 tensors.