Element-Research / rnn

Recurrent Neural Network library for Torch7's nn
BSD 3-Clause "New" or "Revised" License
941 stars 313 forks source link

error when using stacked LSTMs without Sequencer? #80

Closed yoonkim closed 8 years ago

yoonkim commented 8 years ago

Thanks for writing the library!

What's the best way to make multi-layer LSTMs without using the Sequencer? (I am finding Sequencer() to be slightly too memory intensive for my dataset so I want to do the decoder side manually, one step at a time)

model = nn.Sequential():add(nn.FastLSTM(10,10)):add(nn.FastLSTM(10,10))
x1 = torch.randn(5,10)
x2 = torch.randn(5,10)
model:forward(x1)
model:backward(x1, torch.randn(5,10))
model:forward(x2)
model:backward(x2, torch.randn(5,10))

This is all fine, but calling BPTT via

model:backwardThroughTime()

results in the following error:

../torch/install/share/lua/5.1/nn/CMulTable.lua:37: inconsistent tensor size at /tmp/luarocks_torch-scm-1-9349/torch7/lib/TH/generic/THTensorCopy.c:7stack traceback:

        [C]: in function 'copy'
        .../torch/install/share/lua/5.1/nn/CMulTable.lua:37: in function 'updateGradInput'
        .../torch/install/share/lua/5.1/nn/Sequential.lua:55: in function 'updateGradInput'
        ...torch/install/share/lua/5.1/nn/ConcatTable.lua:35: in function 'updateGradInput'
        ../torch/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
        .../torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
        ../torch/install/share/lua/5.1/rnn/LSTM.lua:214: in function 'backwardThroughTi
me'
ghost commented 8 years ago

I think this example does what you want to do: https://github.com/Element-Research/rnn#recurrent

Note that doing it manually restricts you to a non-nn-standard architecture, i.e. you cannot just call backward() on the whole kajiggle and get on with life.

I found it much less headache to just wrap the RNN part in a Sequencer and put some table layers around it, if necessary.

kvitajakub commented 8 years ago

Hello, I am dealing with this too.

I would like to have online sampling with stacked LSTMs and it looks like it is possible to wrap it in a Sequencer for training and then use the network unwrapped(shared clone?) for sampling, as only forward() will be necessary. Is it a good idea?

nicholas-leonard commented 8 years ago

@yoonkim @kvitajakub @kmnns If you don't want to use a Sequencer to wrap stacks of AbstractRecurrent instances (like LSTM), call model:backwardOnline(). Note that you need to call backward in reverse order of calls to forward. Wrapping your model in a model = nn.Recursor(model) will do this automatically. You most likely wont save any memory by not using a Sequencer as this is what Sequencer does internally. To use less memory, use a smaller batch size or sequence length (rho).

hughperkins commented 8 years ago

I hit this problem too, and spent a couple of hours debugging it, until figuring out that calling :backwardOnline() fixes it, so seems like maybe worth mentioning in the docs somehow?

nicholas-leonard commented 8 years ago

@hughperkins it is in the doc : https://github.com/Element-Research/rnn/blob/master/README.md#use-backwardonline . I added that header to make it more obvious to the reader.