Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
939 stars 313 forks source link

How to finetune LSTM models with different batchsize / sequence length compared with pretraining? #348

Closed bearpaw closed 7 years ago

bearpaw commented 7 years ago

Hi all,

I want to finetune a model trained with batchsize 8, and I change the batchsize during finetuning. The forward has no problem, but the backward throws an error:

/home/wyang/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 5 module of nn.Sequential:
/home/wyang/torch/install/share/lua/5.1/nn/ConcatTable.lua:55: bad argument #2 to 'add' (sizes do not match at /home/wyang/torch/extra/cutorch/lib/THC/generic/THCTensorMathPointwise.cu:10)
stack traceback:
    [C]: in function 'add'
    /home/wyang/torch/install/share/lua/5.1/nn/ConcatTable.lua:55: in function 'f'
    /home/wyang/torch/install/share/lua/5.1/nn/ConcatTable.lua:21: in function 'retable'
    /home/wyang/torch/install/share/lua/5.1/nn/ConcatTable.lua:52: in function </home/wyang/torch/install/share/lua/5.1/nn/ConcatTable.lua:30>
    [C]: in function 'xpcall'
    /home/wyang/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
    /home/wyang/torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
    ./models/LSTM/UntiedConvLSTM.lua:134: in function '_updateGradInput'
    ...ng/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:59: in function 'updateGradInput'
    /home/wyang/torch/install/share/lua/5.1/nngraph/gmodule.lua:420: in function 'neteval'
    /home/wyang/torch/install/share/lua/5.1/nngraph/gmodule.lua:454: in function 'updateGradInput'
    /home/wyang/torch/install/share/lua/5.1/rnn/Recursor.lua:45: in function '_updateGradInput'
    ...ng/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:59: in function 'updateGradInput'
    /home/wyang/torch/install/share/lua/5.1/rnn/Sequencer.lua:121: in function 'updateGradInput'
    /home/wyang/torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
    ./trainregression-rnn.lua:222: in function 'train'
    mainpose-rnn.lua:55: in main chunk
    [C]: in function 'dofile'
    ...yang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

I try to remove the optimState. But the error still occurs. The model is built with Sequencer

    -- Final model
    local model = nn.gModule({inp}, out)
    model = nn.Sequencer(model)
    model:remember('both')

    return model:cuda()

The criterion is also with Sequencer

   self.criterion = nn.SequencerCriterion(criterion) 

Any suggestions about this issue? Thanks!

bearpaw commented 7 years ago

Sorry, I got the reason: UntiedConvLSTM.m fixes the batchSize.