Closed robotsorcerer closed 8 years ago
I rearranged the output into a table of the form:
outputs
{
1 :
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
2 :
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
3 :
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
4 :
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
5 :
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
}
but still I get
size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/local/ANT/ogunmolu/torch/pkg/torch/lib/TH/generic/THTensorMath.c:706
stack traceback:
[C]: in function 'addmm'
...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
...T/ogunmolu/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
...NT/ogunmolu/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
...lu/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
...T/ogunmolu/torch/install/share/lua/5.1/rnn/Sequencer.lua:78: in function 'updateGradInput'
...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
rnn.lua:475: in function 'train'
rnn.lua:688: in main chunk
[C]: in function 'dofile'
...molu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406240
@lakehanne I don't think the above code is implementing a 1 input to 6 output . For that you should decorate your Recurrent module into a Repeater. The Repeater takes a single input of size batchsize x inputsize and outputs a table of tensors of size seqlen x batchsize x outputsize where seqlen would be 6 in your case. Is this what you are looking for?
@nicholas-leonard thanks for the response. I am just hearing about the repeater sequence, to be honest with you. A couple of questions,
1) Is there a documentation somewhere for the Repeater module?
2) Also, in the outputs table of tensors of size seqlen X batchSize x outputsize, if seqlen is 6, what would outputsize be?
Thanks in advance!
@lakehanne
Hope that helps.
Thanks for the reply @nicholas-leonard . I tried your suggestion. Here's what I found:
I set up the network as folows:
ninputs = 1
noutputs = 6
nhiddens = 1
rho = 5 -- the max amount of bacprop steos to take back in time
start = 1 -- the size of the output (excluding the batch dimension)
rnnInput = nn.Linear(ninputs, start) --the size of the output
feedback = nn.Linear(start, ninputs) --module that feeds back prev/output to transfer module
transfer = nn.ReLU()
r = nn.Recurrent(start,
rnnInput, --input module from inputs to outs
feedback,
transfer,
rho
)
neunet = nn.Sequential()
:add(r)
:add(nn.Linear(nhiddens, noutputs))
neunet = nn.Repeater(neunet, noutputs)
When I start training in 6 minibatches, as in
for step = 1, rho do
outputs[step] = neunet:forward(inputs[step])
_, outputs[step] = catOut(outputs, step, noutputs, opt)
print('outputs[step]'); print(outputs[step])
--reshape output data
_, targetsTable = catOut(targets, step, noutputs, opt)
err = err + cost:forward(outputs[step], targetsTable)
print('err', err)
end
print(string.format("Step %d, Loss error = %f ", iter, err ))
The call outputs[step] = neunet:forward(inputs[step])
produces a 6 X 6
output matrix which I still do not understand. It should be a 6 X 1
table with each table element consisting of 6 X 1
vectors. Anyhoo, I wrote a minimal function that reshapes the output the way I would expect to see it, i.e. the line _, outputs[step] = catOut(outputs, step, noutputs, opt)
.
When I start my backward propagation through time,
local gradOutputs, gradInputs = {}, {}
for step = rho, 1, -1 do --we basically reverse order of forward calls
gradOutputs[step] = cost:backward(outputs[step], targets[step])
--resize inputs before backward call
inputs_bkwd = gradInputResize(inputs, step, noutputs, opt)
--inputs_bkwd = inputs[step]:view(6, 1):expand(6,6)
print('inputs_bkwd'); print(inputs_bkwd)
print('gradOutputs'); print(gradOutputs[step])
print('#inputs_bkwd'); print(#inputs_bkwd)
print('#gradOutputs'); print(#gradOutputs[step])
gradInputs[step] = neunet:backward(inputs_bkwd, gradOutputs[step])
-- print('gradInputs'); print(gradInputs)
end
The algorithm gets stuck with a size mismatch error when I call backward
. These arer the structure of the data being passed through backward
inputs_bkwd
{
1 : DoubleTensor - size: 6
2 : DoubleTensor - size: 6
3 : DoubleTensor - size: 6
4 : DoubleTensor - size: 6
5 : DoubleTensor - size: 6
6 : DoubleTensor - size: 6
}
gradOutputs
{
1 : DoubleTensor - size: 6x1
2 : DoubleTensor - size: 6x1
3 : DoubleTensor - size: 6x1
4 : DoubleTensor - size: 6x1
5 : DoubleTensor - size: 6x1
6 : DoubleTensor - size: 6x1
}
The error is the same as before viz,
/home/lex/torch/install/bin/lua: /home/lex/torch/install/share/lua/5.1/nn/Linear.lua:75: size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/lex/torch/pkg/torch/lib/TH/generic/THTensorMath.c:766
stack traceback:
[C]: in function 'addmm'
/home/lex/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
...me/lex/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:40: in function 'updateGradInput'
/home/lex/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
rnn.lua:472: in function 'train'
rnn.lua:685: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: ?
What could I be missing in the way I am setting up my decorator? Would appreciate your input. Thank you for your time in replying earlier.
I tried using the nn.AbstractRecurrent([rho])
class instead. But I find that the module has issues with the interpreter:
/home/lex/torch/install/bin/lua: rnn.lua:239: unexpected symbol near '['
stack traceback:
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: ?
When I remove the square brackets around rho
, I get
/home/lex/torch/install/bin/lua: ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:208: attempt to call field '__tostring__' (a nil value)
stack traceback:
...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:208: in function <...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:204>
[C]: ?
[C]: in function 'tostring'
...me/lex/torch/install/share/lua/5.1/nn/Sequential.lua:118: in function '__tostring__'
...e/lex/torch/install/share/lua/5.1/dpnn/Decorator.lua:34: in function <...e/lex/torch/install/share/lua/5.1/dpnn/Decorator.lua:32>
[C]: ?
[C]: in function 'tostring'
...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:85: in function <...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:79>
[C]: ?
[C]: in function 'tostring'
/home/lex/torch/install/share/lua/5.1/trepl/init.lua:257: in function 'rawprint'
/home/lex/torch/install/share/lua/5.1/trepl/init.lua:297: in function 'print'
rnn.lua:249: in function 'contruct_net'
rnn.lua:319: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: ?
@lakehanne You are using noutputs = 6
for both outputsize
via nn.Linear(...,outputsize)
and as the seqlen
in nn.Repeater(...,seqlen)
. So that is why your output is 6x6
.
Thank you very much indeed. That helped.
My network is now constructed properly with
--we first model the inputs to states
ffwd = nn.Sequential()
:add(nn.Linear(ninputs, nhiddens))
:add(transfer)
--then pass states to outputs
r = nn.Recurrent(start,
rnnInput, --input module from inputs to outs
feedback,
transfer,
rho
)
-- r = nn.AbstractRecurrent(rho)
--we then join the feedforward with the recurrent net
neunet = nn.Sequential()
:add(ffwd)
:add(r)
:add(nn.Linear(nhiddens, 1))
neunet = nn.Repeater(neunet, noutputs)
and this yields a network graph of the following sort:
rnn
nn.Repeater {
[ input, input, ..., input ]
V V V
nn.Recursor @ nn.Sequential {
[input -> (1) -> (2) -> (3) -> output]
(1): nn.Sequential {
[input -> (1) -> (2) -> output]
(1): nn.Linear(1 -> 1)
(2): nn.ReLU
}
(2): nn.Recurrent {
[{input(t), output(t-1)} -> (1) -> (2) -> (3) -> output(t)]
(1): {
input(t)
|`-> (t==0): nn.Add
|`-> (t~=0): nn.Linear(1 -> 1)
output(t-1)
|`-> nn.Linear(1 -> 1)
}
(2): nn.CAddTable
(3): nn.ReLU
}
(3): nn.Linear(1 -> 1)
}
V V V
[output(1),output(2),...,output(6)]
But when I do BPTT
, the first iteration of the geadInputs
gets computed whereas subsequent calls throw errors like
/home/lex/torch/install/bin/lua: ...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:41: assertion failed!
stack traceback:
[C]: in function 'assert'
...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:41: in function '_updateGradInput'
...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:40: in function 'updateGradInput'
/home/lex/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
rnn.lua:462: in function 'train'
rnn.lua:676: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: ?
My BPTT code snippet is
local gradOutputs, gradInputs = {}, {}
for step = rho, 1, -1 do --we basically reverse order of forward calls
gradOutputs[step] = cost:backward(outputs[step], targets[step])
gradInputs[step] = neunet:backward(inputs[step], gradOutputs[step])
neunet:updateParameters(opt.rnnlearningRate)
end
Perhaps I am not correctly passing the right inputs size to backward
but I do not seem to be able to find any thing wrong so far. Maybe a third eye can help out.
@lakehanne You shouldn't need to use a for-loop. That is handled internally by the Repeater:
cost = SequencerCriterion(cost) -- or if you have only one target : RepeaterCriterion(cost)
outputs = neunet:forward(input)
loss = cost:forward(outputs, targets)
gradOutputs = cost:backward(outputs, targets)
neunet:zeroGradParameters()
neunet:backward(input, gradOutputs)
Also since you are using Repeater, you should only need one input.
Wow! Couple of questions:
1) The snippet you have up there doesn't seem to be implementing batches of inputs. I have noticed that most of the examples you have also do not do batches. Are batches internally handled in the repeater module as well?
2) Also, the targets in the snippet above are not incremented after calling forward
steps as some of your examples show. No need for this in the Repeater module too?
3) Lastly, does this mean I should basically make all targets to be equal to true outputs without worrying about incrementing sequence indices after every forward call?
Thanks for your quick response and I appreciate your help so far.
1) The snippet uses batches. As far as I can tell, all our examples uses batches. Again the input
is a tensor of size batchsize x inputsize
. The output
has size seqlen x batchsize x outputsize
. A batch does not require a for loop. The difference between batch and online is handled internally by modules.
2) Targets are up to you to determine. When incremented in the examples, it is for language models. In your case, it is up to you to define your dataset. You are using Repeater so I assume it isn't a language model.
3) Depends. What are the inputs
and targets
of your dataset?
Thank you very much indeed, @nicholas-leonard !
1) I had it all mixed up until you explained in your previous post. I have the repeater working now. I simply set up my input
as batchSize X inputSize
which in my case is 6 X 1
, generated an output
of size seqlen X batchSize X outputSize
where seqlen = 6
and outputSize = 1
. Since I am using 5 rho
time steps, my predictions after calling forward are of size 5 X 6 X 1
as you rightly noticed. Everything works perfectly and I am so indebted to you for helping out.
2 and 3) I simply incremented my targets in steps of 1 based on the delay I found in the data. I must say my problem does not fall into the standard machine learning domain. It is the system identification and control of a biomedical robot targeted towards cancer radiotherapy. My input is applied current to a pneumatic valve which controls a soft robot while the output is the corresponding motion on a patient head (6-DOF motion, hence 6 outputs).
Few questions again :-),
gradInputs
are always zero. I suspect something is wrong.Step 9718, Loss error = 91287.138996
Step 9719, Loss error = 91287.138996
Step 9720, Loss error = 91287.138996
Step 9721, Loss error = 96812.036729
Step 9722, Loss error = 96812.030329
Step 9723, Loss error = 94386.266278
Step 9724, Loss error = 98474.770919
Step 9725, Loss error = 91302.628445
Step 9726, Loss error = 94854.937780
Step 9727, Loss error = 101400.172578
Step 9728, Loss error = 91334.555197
Step 9729, Loss error = 91289.035644
Step 9730, Loss error = 91287.214862
Step 9731, Loss error = 91287.142031
i.e., it decreases, increases and then decreases. I suspect this shouldn't be. Here is the code section that is relevant. I am sorry I am taking so much of your time but would definitely appreciate your insight:
local iter = 1
for t = 1, train_input:size()[1], 1 do
offsets = train_input[{ {t} }]
offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)
--BPTT
inputs, targets = {}, {}
--batch of inputs
inputs = train_input:index(1, offsets)
targets = {train_out[1]:index(1, offsets), train_out[2]:index(1, offsets),
train_out[3]:index(1, offsets), train_out[4]:index(1, offsets),
train_out[5]:index(1, offsets), train_out[6]:index(1, offsets)}
--increase offsets indices by 1
offsets = train_input[{ {t+1} }]
offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)
--2. Forward sequence through rnn
neunet:zeroGradParameters()
neunet:forget() --forget all past time steps
outputs, err = {}, 0
outputs = neunet:forward(inputs)
err = err + cost:forward(outputs, targets)
print(string.format("Step %d, Loss error = %f ", iter, err ))
--3. do backward propagation through time(Werbos, 1990, Rummelhart, 1986)
local gradOutputs, gradInputs = {}, {}
--we basically reverse order of forward calls
gradOutputs = cost:backward(outputs, targets)
gradInputs = neunet:backward(inputs, gradOutputs)
--print('gradOutputs'); print(gradOutputs)
--4. update lr
neunet:updateParameters(opt.rnnlearningRate)
iter = iter + 1
end
@lakehanne So given the current which is a single number (base on your inputsize =1), you are trying to predict the motion of a robot along 6 DOF. If your data is sequential, you should use a different model. So for example, assuming your dataset is a sequence of N time-steps where x[t] depends on x[t-1] and so on, which could look like:
dataset = {inputs = torch.Tensor(N), targets = torch.Tensor(N, 6)}
Then you should use something like a Sequencer instead of Repeater. Is this the kind of dataset you have?
Yes this is the sort of dataset I have. Given the current to a pneumatic valve which pumps air into a robot, I am trying to predict the 6-DOF motion of the effect of the robot actuator on an object. I was using the sequencer before but you advised against it as I was getting stuck with a size mismatch error when I call backward
. See comment 2 above (posted 11 days ago).
My data is indeed sequential as you rightly noticed with a sequence of N time-steps where x[t] depends on x[t-1] and so on. You are suggesting I abandon the nn.Repeater
decorator and go back to the nn.Sequencer
instead?
Yes. Obviously there was a lot of confusion in the above thread. What you want is to use Sequencer. The inputs to your model will be of size seqlen x batchsize x inputsize
, the outputs will be seqlen x batchsize x outputsize
. The seqlen
is a hyper-parameters. It should be something like 10, 20, 50 or 100. When you go through your dataset, your inputs and targets will be batches of seqlen
rows where seqlen << N
.
Thanks, Leonard! Just to be roundly sure, the gradInputs = 0 that results from my repeater training and irregularity in loss/error is because I did not use the sequencer module?
Secondly, what do you mean by, "When you go through your dataset, your chunks 'seqlen' rows ..."
@lakehanne Corrected in original comment.
What's the purpose of the seqlen
hyperparameter? Same as rho
? And you form batchSize by something of this nature:
offsets = torch.LongTensor(opt.batchSize):random(1,opt.dataSize)
where dataSize
is the length of the data, e.g. N
, above?
Yes. Yes. Yes. :)
Thanks! If that is the case, it keeps getting stuck during calls to backward
like I originally had in post 1 i.e. gradInputs
cannot compute because of inconsistent tensor size.
My problem so far has been trying to get the network predict six outputs based on a single input because in a clinical scenario, I would want the network to choose the desired input that would give the right Twist
(or 6-DOF) motion. I have control of the input. What I do not have control over is the output. The sequencer module seems to be capable of doing one-to-one predictions only. I think what I want is a one-to-many predictor such as the Repeater
module you earlier spoke of.
Here is a public gist of the code section that's causing World War III in my head :)
Sorry to be taking so much of your time, but I would love to hear what you think regarding what you think might be contributing to these errors.
Fixed in my FARNN SR-AllModels tag. Thanks for your help @nicholas-leonard !
Hi everyone,
I have a single input, six output system which I am training with the rnn module. I construct the rnn network as follows:
I should mention that I am training in mini-batches of 6 at a time. When I call forward as follows
I end up with an output that is a 6 X 6 matrix. I would expect that I should obtain a 6 X 1 table with each element consisting of 6 outputs. I wonder if this is normal. If it is not, can someone please explain to me what I could possibly be doing wrong?
Thanks!