Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

the model that use clone() as like rnn has a problem in updating parameters #292

Closed yjy765 closed 8 years ago

yjy765 commented 8 years ago

hello, I am using rnn itself and also would like to make a customized layer which contains some shared network as like rnn. The real one I made is so complex to test so I made simple layer composed of shared linear network as follows

require 'rnn'
local test, parent = torch.class('nn.test','nn.Module')

function test:__init()
   parent.__init(self)
   self.node = nn.Sequential():add(nn.Linear(3,2))
   self.module = {}
   self.module[1] = self.node
end

function test:get(step)
  local module = self.module[step]
  if not module then
    module = self.node:sharedClone('weight','bias','gradWeight','gradBias'
    self.module[step] = module
  end
  return module
end

function test:updateOutput(input)
  self.output = {}
  for i=1,3 do
    self.output[i] = self:get(i):forward(input)
  end
  return self.output
end

function test:updateGradInput(input,gradOutput)
  self.gradInput = torch.Tensor(input:size()):zero()
  for i=1,3 do
    self.gradInput = self.gradInput + self:get(i):backward(input,gradOutput[i])
  end
  return self.gradInput
end

function test:accGradParameters(input,gradOutput,scale)
   for i=1,3 do
     self:get(i):zeroGradParameters()
     self:get(i):accGradParameters(input,gradOutput[i],scale)
  end
end

(To shortly sum up, the layer contains three linear layers in parallel which share all parameters each other and borrowed the form of code from LSTM.lua etc) so I made a very simple main script as like

input = nn.Identity()()
output = nn.test()(input)
model = nn.gModule(input,output)

then the forward and backward of model work well

However when I execute model:updateParameters(lr), there is no change of parameters

When I call model:getParameters() then it returns 8 parameters and 8 gradient parameters

Also if I do model:zeroGradParameters(), again no change of gradient parameters. Hence, I added self:get(i):zeroGradParameters() in the accGradParameters function.

One more strange thing is model:getParameters() works successfully but it does not return anything when I call model:parameters(). In the Module.lua file, as I understand, model:parameter() should return the same thing with model:getParameters(). If I do not include 'rnn' then it fails to call model:getParameters(). I looked at the code in rnn a lot but failed to find a reason why model:getParameters() success when included 'rnn' and failed when not.

Can you explain and solve the problem plz? I know this is not directly related to rnn itself but I'll really appreciate if you answer me.

nicholas-leonard commented 8 years ago

@yjy765 Yeah it isn't as easy as that. The Module class isn't the best parent for this kind of module. Instead, nn.Container should be used. The self.modules should be filled with either just node or all the 3 module clones (each approach has its advantages/disadvantages).

yjy765 commented 8 years ago

@nicholas-leonard Really thank you for your reply!! Anyway I found that it would be ok if I use gradparam:zero() and optim package rather than using model:zeroGradparameters() and updateParameters(). Thanks nicholas