Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
941 stars 313 forks source link

Incorrect prediction after calling forwad #179

Closed robotsorcerer closed 8 years ago

robotsorcerer commented 8 years ago

Hi everyone,

I have a single input, six output system which I am training with the rnn module. I construct the rnn network as follows:

ninputs = 1
noutputs = 6
  --Recurrent Neural Net Initializations 
if opt.model == 'rnn' then
  rho         = 5                           -- the max amount of bacprop steos to take back in time
  start       = 1                          -- the size of the output (excluding the batch dimension)        
  rnnInput    = nn.Linear(ninputs, start)     --the size of the output
  feedback    = nn.Linear(start, ninputs)           --module that feeds back prev/output to transfer module
  transfer    = nn.ReLU()                     -- transfer function
end
--[[Set up the network, add layers in place as we add more abstraction]]
function contruct_net()
  if opt.model  == 'mlp' then
          neunet          = nn.Sequential()
          neunet:add(nn.Linear(ninputs, nhiddens))
          neunet:add(transfer)                         
          neunet:add(nn.Linear(nhiddens, noutputs)) 

  elseif opt.model == 'rnn' then
    require 'rnn'
    --RNN
    r = nn.Recurrent(start, 
                     rnnInput,  --input module from inputs to outs
                     feedback,
                     transfer,
                     rho             
                     )

    neunet     = nn.Sequential()
              :add(r)
              :add(nn.Linear(nhiddens_rnn, noutputs))

    neunet    = nn.Sequencer(neunet)

I should mention that I am training in mini-batches of 6 at a time. When I call forward as follows

      for step = 1, rho do   
        table.insert(inputs_, inputs[step])
        outputs[step] = neunet:forward(inputs_)
      end

I end up with an output that is a 6 X 6 matrix. I would expect that I should obtain a 6 X 1 table with each element consisting of 6 outputs. I wonder if this is normal. If it is not, can someone please explain to me what I could possibly be doing wrong?

Thanks!

robotsorcerer commented 8 years ago

I rearranged the output into a table of the form:

 outputs
{
  1 : 
    {
      1 : DoubleTensor - size: 6x1
      2 : DoubleTensor - size: 6x1
      3 : DoubleTensor - size: 6x1
      4 : DoubleTensor - size: 6x1
      5 : DoubleTensor - size: 6x1
      6 : DoubleTensor - size: 6x1
    }
  2 : 
    {
      1 : DoubleTensor - size: 6x1
      2 : DoubleTensor - size: 6x1
      3 : DoubleTensor - size: 6x1
      4 : DoubleTensor - size: 6x1
      5 : DoubleTensor - size: 6x1
      6 : DoubleTensor - size: 6x1
    }
  3 : 
    {
      1 : DoubleTensor - size: 6x1
      2 : DoubleTensor - size: 6x1
      3 : DoubleTensor - size: 6x1
      4 : DoubleTensor - size: 6x1
      5 : DoubleTensor - size: 6x1
      6 : DoubleTensor - size: 6x1
    }
  4 : 
    {
      1 : DoubleTensor - size: 6x1
      2 : DoubleTensor - size: 6x1
      3 : DoubleTensor - size: 6x1
      4 : DoubleTensor - size: 6x1
      5 : DoubleTensor - size: 6x1
      6 : DoubleTensor - size: 6x1
    }
  5 : 
    {
      1 : DoubleTensor - size: 6x1
      2 : DoubleTensor - size: 6x1
      3 : DoubleTensor - size: 6x1
      4 : DoubleTensor - size: 6x1
      5 : DoubleTensor - size: 6x1
      6 : DoubleTensor - size: 6x1
    }
}

but still I get

size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/local/ANT/ogunmolu/torch/pkg/torch/lib/TH/generic/THTensorMath.c:706
stack traceback:
    [C]: in function 'addmm'
    ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
    ...T/ogunmolu/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
    ...NT/ogunmolu/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
    ...lu/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
    ...T/ogunmolu/torch/install/share/lua/5.1/rnn/Sequencer.lua:78: in function 'updateGradInput'
    ...l/ANT/ogunmolu/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    rnn.lua:475: in function 'train'
    rnn.lua:688: in main chunk
    [C]: in function 'dofile'
    ...molu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406240
nicholas-leonard commented 8 years ago

@lakehanne I don't think the above code is implementing a 1 input to 6 output . For that you should decorate your Recurrent module into a Repeater. The Repeater takes a single input of size batchsize x inputsize and outputs a table of tensors of size seqlen x batchsize x outputsize where seqlen would be 6 in your case. Is this what you are looking for?

robotsorcerer commented 8 years ago

@nicholas-leonard thanks for the response. I am just hearing about the repeater sequence, to be honest with you. A couple of questions,

1) Is there a documentation somewhere for the Repeater module?

2) Also, in the outputs table of tensors of size seqlen X batchSize x outputsize, if seqlen is 6, what would outputsize be?

Thanks in advance!

nicholas-leonard commented 8 years ago

@lakehanne

  1. Doc can be found here : https://github.com/Element-Research/rnn#rnn.Repeater. It isn't much I admit.
  2. if you are using a Recurrent with an inputSize of 10 and outputSize of 20, and you decorate it with a Repeater having a rho argument of 6 (output sequence length), and your batchsize is 5 such that your input to the Repeater(Recurrent) is 5 x 10, then the output will be 6 x 5 x 20.

Hope that helps.

robotsorcerer commented 8 years ago

Thanks for the reply @nicholas-leonard . I tried your suggestion. Here's what I found:

I set up the network as folows:

ninputs     = 1
noutputs    = 6
nhiddens    = 1  
  rho         = 5                           -- the max amount of bacprop steos to take back in time
  start       = 1                         -- the size of the output (excluding the batch dimension)        
  rnnInput    = nn.Linear(ninputs, start)     --the size of the output
  feedback    = nn.Linear(start, ninputs)           --module that feeds back prev/output to transfer module
  transfer    = nn.ReLU()   
    r = nn.Recurrent(start, 
                     rnnInput,  --input module from inputs to outs
                     feedback,
                     transfer,
                     rho             
                     )

    neunet     = nn.Sequential()
              :add(r)
              :add(nn.Linear(nhiddens, noutputs))

    neunet    = nn.Repeater(neunet, noutputs)

When I start training in 6 minibatches, as in

      for step = 1, rho do   
        outputs[step] = neunet:forward(inputs[step])
        _, outputs[step] = catOut(outputs, step, noutputs, opt)
        print('outputs[step]'); print(outputs[step])
        --reshape output data
        _, targetsTable = catOut(targets, step, noutputs, opt) 
        err     = err + cost:forward(outputs[step], targetsTable)
        print('err', err)
      end
      print(string.format("Step %d, Loss error = %f ", iter, err ))

The call outputs[step] = neunet:forward(inputs[step]) produces a 6 X 6 output matrix which I still do not understand. It should be a 6 X 1 table with each table element consisting of 6 X 1 vectors. Anyhoo, I wrote a minimal function that reshapes the output the way I would expect to see it, i.e. the line _, outputs[step] = catOut(outputs, step, noutputs, opt).

When I start my backward propagation through time,

         local gradOutputs, gradInputs = {}, {}
      for step = rho, 1, -1 do  --we basically reverse order of forward calls              
        gradOutputs[step] = cost:backward(outputs[step], targets[step])

        --resize inputs before backward call
        inputs_bkwd = gradInputResize(inputs, step, noutputs, opt)
        --inputs_bkwd = inputs[step]:view(6, 1):expand(6,6)
        print('inputs_bkwd'); print(inputs_bkwd)
        print('gradOutputs'); print(gradOutputs[step])
        print('#inputs_bkwd'); print(#inputs_bkwd)
        print('#gradOutputs'); print(#gradOutputs[step])
        gradInputs[step]  = neunet:backward(inputs_bkwd, gradOutputs[step])
        -- print('gradInputs'); print(gradInputs)
      end

The algorithm gets stuck with a size mismatch error when I call backward. These arer the structure of the data being passed through backward

inputs_bkwd 
{
  1 : DoubleTensor - size: 6
  2 : DoubleTensor - size: 6
  3 : DoubleTensor - size: 6
  4 : DoubleTensor - size: 6
  5 : DoubleTensor - size: 6
  6 : DoubleTensor - size: 6
}
gradOutputs 
{
  1 : DoubleTensor - size: 6x1
  2 : DoubleTensor - size: 6x1
  3 : DoubleTensor - size: 6x1
  4 : DoubleTensor - size: 6x1
  5 : DoubleTensor - size: 6x1
  6 : DoubleTensor - size: 6x1
}

The error is the same as before viz,

/home/lex/torch/install/bin/lua: /home/lex/torch/install/share/lua/5.1/nn/Linear.lua:75: size mismatch, m1: [6 x 1], m2: [6 x 1] at /home/lex/torch/pkg/torch/lib/TH/generic/THTensorMath.c:766
stack traceback:
    [C]: in function 'addmm'
    /home/lex/torch/install/share/lua/5.1/nn/Linear.lua:75: in function 'updateGradInput'
    ...me/lex/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
    ...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
    ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
    ...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:40: in function 'updateGradInput'
    /home/lex/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    rnn.lua:472: in function 'train'
    rnn.lua:685: in main chunk
    [C]: in function 'dofile'
    .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: ?

What could I be missing in the way I am setting up my decorator? Would appreciate your input. Thank you for your time in replying earlier.

robotsorcerer commented 8 years ago

I tried using the nn.AbstractRecurrent([rho]) class instead. But I find that the module has issues with the interpreter:

/home/lex/torch/install/bin/lua: rnn.lua:239: unexpected symbol near '['
stack traceback:
    [C]: in function 'dofile'
    .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: ?

When I remove the square brackets around rho, I get

/home/lex/torch/install/bin/lua: ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:208: attempt to call field '__tostring__' (a nil value)
stack traceback:
    ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:208: in function <...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:204>
    [C]: ?
    [C]: in function 'tostring'
    ...me/lex/torch/install/share/lua/5.1/nn/Sequential.lua:118: in function '__tostring__'
    ...e/lex/torch/install/share/lua/5.1/dpnn/Decorator.lua:34: in function <...e/lex/torch/install/share/lua/5.1/dpnn/Decorator.lua:32>
    [C]: ?
    [C]: in function 'tostring'
    ...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:85: in function <...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:79>
    [C]: ?
    [C]: in function 'tostring'
    /home/lex/torch/install/share/lua/5.1/trepl/init.lua:257: in function 'rawprint'
    /home/lex/torch/install/share/lua/5.1/trepl/init.lua:297: in function 'print'
    rnn.lua:249: in function 'contruct_net'
    rnn.lua:319: in main chunk
    [C]: in function 'dofile'
    .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: ?
nicholas-leonard commented 8 years ago

@lakehanne You are using noutputs = 6 for both outputsize via nn.Linear(...,outputsize) and as the seqlen in nn.Repeater(...,seqlen). So that is why your output is 6x6.

robotsorcerer commented 8 years ago

Thank you very much indeed. That helped.

My network is now constructed properly with

    --we first model the inputs to states
    ffwd       =   nn.Sequential()
                  :add(nn.Linear(ninputs, nhiddens))
                  :add(transfer)
 --then pass states to outputs
   r = nn.Recurrent(start, 
                     rnnInput,  --input module from inputs to outs
                     feedback,
                     transfer,
                     rho             
                     )

   --     r      = nn.AbstractRecurrent(rho)
    --we then join the feedforward with the recurrent net
    neunet     = nn.Sequential()
                  :add(ffwd)
                  :add(r)
                  :add(nn.Linear(nhiddens, 1))

    neunet    = nn.Repeater(neunet, noutputs)

and this yields a network graph of the following sort:

rnn 
nn.Repeater {
  [  input,    input,  ...,  input  ]
       V         V             V     
  nn.Recursor @ nn.Sequential {
    [input -> (1) -> (2) -> (3) -> output]
    (1): nn.Sequential {
      [input -> (1) -> (2) -> output]
      (1): nn.Linear(1 -> 1)
      (2): nn.ReLU
    }
    (2): nn.Recurrent {
      [{input(t), output(t-1)} -> (1) -> (2) -> (3) -> output(t)]
      (1):  {
        input(t)
          |`-> (t==0): nn.Add
          |`-> (t~=0): nn.Linear(1 -> 1)
        output(t-1)
          |`-> nn.Linear(1 -> 1)
      }
      (2): nn.CAddTable
      (3): nn.ReLU
    }
    (3): nn.Linear(1 -> 1)
  }
       V         V             V     
  [output(1),output(2),...,output(6)]

But when I do BPTT, the first iteration of the geadInputs gets computed whereas subsequent calls throw errors like

/home/lex/torch/install/bin/lua: ...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:41: assertion failed!
stack traceback:
    [C]: in function 'assert'
    ...ome/lex/torch/install/share/lua/5.1/rnn/Recursor.lua:41: in function '_updateGradInput'
    ...orch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:46: in function 'updateGradInput'
    ...ome/lex/torch/install/share/lua/5.1/rnn/Repeater.lua:40: in function 'updateGradInput'
    /home/lex/torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
    rnn.lua:462: in function 'train'
    rnn.lua:676: in main chunk
    [C]: in function 'dofile'
    .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: ?

My BPTT code snippet is

      local gradOutputs, gradInputs = {}, {}
      for step = rho, 1, -1 do  --we basically reverse order of forward calls 
        gradOutputs[step] = cost:backward(outputs[step], targets[step])
        gradInputs[step]  = neunet:backward(inputs[step], gradOutputs[step])   
        neunet:updateParameters(opt.rnnlearningRate)
      end

Perhaps I am not correctly passing the right inputs size to backward but I do not seem to be able to find any thing wrong so far. Maybe a third eye can help out.

nicholas-leonard commented 8 years ago

@lakehanne You shouldn't need to use a for-loop. That is handled internally by the Repeater:

cost = SequencerCriterion(cost) -- or if you have only one target : RepeaterCriterion(cost)
outputs = neunet:forward(input)
loss = cost:forward(outputs, targets) 
gradOutputs = cost:backward(outputs, targets)
neunet:zeroGradParameters()
neunet:backward(input, gradOutputs)

Also since you are using Repeater, you should only need one input.

robotsorcerer commented 8 years ago

Wow! Couple of questions:

1) The snippet you have up there doesn't seem to be implementing batches of inputs. I have noticed that most of the examples you have also do not do batches. Are batches internally handled in the repeater module as well?

2) Also, the targets in the snippet above are not incremented after calling forward steps as some of your examples show. No need for this in the Repeater module too?

3) Lastly, does this mean I should basically make all targets to be equal to true outputs without worrying about incrementing sequence indices after every forward call?

Thanks for your quick response and I appreciate your help so far.

nicholas-leonard commented 8 years ago

1) The snippet uses batches. As far as I can tell, all our examples uses batches. Again the input is a tensor of size batchsize x inputsize. The output has size seqlen x batchsize x outputsize. A batch does not require a for loop. The difference between batch and online is handled internally by modules.

2) Targets are up to you to determine. When incremented in the examples, it is for language models. In your case, it is up to you to define your dataset. You are using Repeater so I assume it isn't a language model.

3) Depends. What are the inputs and targets of your dataset?

robotsorcerer commented 8 years ago

Thank you very much indeed, @nicholas-leonard !

1) I had it all mixed up until you explained in your previous post. I have the repeater working now. I simply set up my input as batchSize X inputSize which in my case is 6 X 1, generated an output of size seqlen X batchSize X outputSize where seqlen = 6 and outputSize = 1. Since I am using 5 rho time steps, my predictions after calling forward are of size 5 X 6 X 1 as you rightly noticed. Everything works perfectly and I am so indebted to you for helping out.

2 and 3) I simply incremented my targets in steps of 1 based on the delay I found in the data. I must say my problem does not fall into the standard machine learning domain. It is the system identification and control of a biomedical robot targeted towards cancer radiotherapy. My input is applied current to a pneumatic valve which controls a soft robot while the output is the corresponding motion on a patient head (6-DOF motion, hence 6 outputs).

Few questions again :-),

Step 9718, Loss error = 91287.138996    
Step 9719, Loss error = 91287.138996    
Step 9720, Loss error = 91287.138996    
Step 9721, Loss error = 96812.036729    
Step 9722, Loss error = 96812.030329    
Step 9723, Loss error = 94386.266278    
Step 9724, Loss error = 98474.770919    
Step 9725, Loss error = 91302.628445    
Step 9726, Loss error = 94854.937780    
Step 9727, Loss error = 101400.172578   
Step 9728, Loss error = 91334.555197    
Step 9729, Loss error = 91289.035644    
Step 9730, Loss error = 91287.214862    
Step 9731, Loss error = 91287.142031    

i.e., it decreases, increases and then decreases. I suspect this shouldn't be. Here is the code section that is relevant. I am sorry I am taking so much of your time but would definitely appreciate your insight:

    local iter = 1
    for t = 1, train_input:size()[1], 1 do
      offsets = train_input[{ {t} }] 
      offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)

      --BPTT
      inputs, targets = {}, {}                               
        --batch of inputs
        inputs = train_input:index(1, offsets)
        targets = {train_out[1]:index(1, offsets), train_out[2]:index(1, offsets), 
                          train_out[3]:index(1, offsets), train_out[4]:index(1, offsets), 
                          train_out[5]:index(1, offsets), train_out[6]:index(1, offsets)}
        --increase offsets indices by 1
        offsets = train_input[{ {t+1} }] 
        offsets = torch.LongTensor():resize(offsets:size()[1]):copy(offsets)

      --2. Forward sequence through rnn
      neunet:zeroGradParameters()
      neunet:forget()  --forget all past time steps

        outputs, err = {}, 0   
        outputs = neunet:forward(inputs)  
        err     = err + cost:forward(outputs, targets)
        print(string.format("Step %d, Loss error = %f ", iter, err ))

      --3. do backward propagation through time(Werbos, 1990, Rummelhart, 1986)
      local gradOutputs, gradInputs = {}, {}
        --we basically reverse order of forward calls 
        gradOutputs = cost:backward(outputs, targets)
        gradInputs  = neunet:backward(inputs, gradOutputs) 
        --print('gradOutputs'); print(gradOutputs)  
      --4. update lr
      neunet:updateParameters(opt.rnnlearningRate)
      iter = iter + 1
    end
nicholas-leonard commented 8 years ago

@lakehanne So given the current which is a single number (base on your inputsize =1), you are trying to predict the motion of a robot along 6 DOF. If your data is sequential, you should use a different model. So for example, assuming your dataset is a sequence of N time-steps where x[t] depends on x[t-1] and so on, which could look like:

dataset = {inputs = torch.Tensor(N), targets = torch.Tensor(N, 6)}

Then you should use something like a Sequencer instead of Repeater. Is this the kind of dataset you have?

robotsorcerer commented 8 years ago

Yes this is the sort of dataset I have. Given the current to a pneumatic valve which pumps air into a robot, I am trying to predict the 6-DOF motion of the effect of the robot actuator on an object. I was using the sequencer before but you advised against it as I was getting stuck with a size mismatch error when I call backward. See comment 2 above (posted 11 days ago).

My data is indeed sequential as you rightly noticed with a sequence of N time-steps where x[t] depends on x[t-1] and so on. You are suggesting I abandon the nn.Repeater decorator and go back to the nn.Sequencer instead?

nicholas-leonard commented 8 years ago

Yes. Obviously there was a lot of confusion in the above thread. What you want is to use Sequencer. The inputs to your model will be of size seqlen x batchsize x inputsize, the outputs will be seqlen x batchsize x outputsize. The seqlen is a hyper-parameters. It should be something like 10, 20, 50 or 100. When you go through your dataset, your inputs and targets will be batches of seqlen rows where seqlen << N.

robotsorcerer commented 8 years ago

Thanks, Leonard! Just to be roundly sure, the gradInputs = 0 that results from my repeater training and irregularity in loss/error is because I did not use the sequencer module?

Secondly, what do you mean by, "When you go through your dataset, your chunks 'seqlen' rows ..."

nicholas-leonard commented 8 years ago

@lakehanne Corrected in original comment.

robotsorcerer commented 8 years ago

What's the purpose of the seqlen hyperparameter? Same as rho? And you form batchSize by something of this nature:

offsets = torch.LongTensor(opt.batchSize):random(1,opt.dataSize)

where dataSize is the length of the data, e.g. N, above?

nicholas-leonard commented 8 years ago

Yes. Yes. Yes. :)

robotsorcerer commented 8 years ago

Thanks! If that is the case, it keeps getting stuck during calls to backward like I originally had in post 1 i.e. gradInputs cannot compute because of inconsistent tensor size.

My problem so far has been trying to get the network predict six outputs based on a single input because in a clinical scenario, I would want the network to choose the desired input that would give the right Twist (or 6-DOF) motion. I have control of the input. What I do not have control over is the output. The sequencer module seems to be capable of doing one-to-one predictions only. I think what I want is a one-to-many predictor such as the Repeater module you earlier spoke of.

Here is a public gist of the code section that's causing World War III in my head :)

Sorry to be taking so much of your time, but I would love to hear what you think regarding what you think might be contributing to these errors.

robotsorcerer commented 8 years ago

Fixed in my FARNN SR-AllModels tag. Thanks for your help @nicholas-leonard !