Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

Bad size of gradInput in BiSequencerLM #418

Open saztorralba opened 7 years ago

saztorralba commented 7 years ago

Hi

I'm trying to use BiSequencerLM to train a network using sequences of different length, but I'm finding an issue in the gradInput of the BiSequencerLM module. When a sequence is shorter than the previous sequence, the number of elements in self.gradInput in the function below is the number of elements of the previous sequence, not the current sequence.

function BiSequencerLM:updateGradInput(input, gradOutput)
   local nStep = #input

   self._mergeGradInput = self._merge:updateGradInput(self._mergeInput, gradOutput)
   self._fwdGradInput = self._fwd:updateGradInput(_.first(input, nStep - 1), _.last(self._mergeGradInput[1], nStep - 1))
   self._bwdGradInput = self._bwd:updateGradInput(_.last(input, nStep - 1), _.first(self._mergeGradInput[2], nStep - 1))

   -- add fwd rnn gradInputs to bwd rnn gradInputs
   for i=1,nStep do
      if i == 1 then
         self.gradInput[1] = self._fwdGradInput[1]
      elseif i == nStep then
         self.gradInput[nStep] = self._bwdGradInput[nStep-1]
      else
         self.gradInput[i] = nn.rnn.recursiveCopy(self.gradInput[i], self._fwdGradInput[i])
         nn.rnn.recursiveAdd(self.gradInput[i], self._bwdGradInput[i-1])
      end
   end
   return self.gradInput
end

I believe this is caused by self.gradInput not being recreated for the new sequence, and hence maintaining the length of the previous sequence. This causes an error when you have further modules down to backpropagate, because their gradOutput is going to have incorrect size (different to their input). This issue can be fixed by setting gradInput to an empty table before the call to updateGradInput. I can do this by accessing the module form my code, but maybe it would be better to just add this line of code

self.gradInput={}

before the for loop (something equivalent would have to be done if working with Tensors instead of Tables).

Or maybe I'm wrong and this is expected behavior and I'm doing something wrong, in which case any advice is appreciated. Thanks!

murthyrudra commented 7 years ago

Hi, I'm facing same issue when training a language model (sort of). Please suggest me how do i get this issue resolved. Additionally, i'm using optim package for optimization.