Closed yoonkim closed 8 years ago
I think this example does what you want to do: https://github.com/Element-Research/rnn#recurrent
Note that doing it manually restricts you to a non-nn
-standard architecture, i.e. you cannot just call backward()
on the whole kajiggle and get on with life.
I found it much less headache to just wrap the RNN part in a Sequencer
and put some table layers around it, if necessary.
Hello, I am dealing with this too.
I would like to have online sampling with stacked LSTMs and it looks like it is possible to wrap it in a Sequencer for training and then use the network unwrapped(shared clone?) for sampling, as only forward() will be necessary. Is it a good idea?
@yoonkim @kvitajakub @kmnns If you don't want to use a Sequencer to wrap stacks of AbstractRecurrent instances (like LSTM), call model:backwardOnline()
. Note that you need to call backward in reverse order of calls to forward. Wrapping your model in a model = nn.Recursor(model)
will do this automatically. You most likely wont save any memory by not using a Sequencer as this is what Sequencer does internally. To use less memory, use a smaller batch size or sequence length (rho).
I hit this problem too, and spent a couple of hours debugging it, until figuring out that calling :backwardOnline()
fixes it, so seems like maybe worth mentioning in the docs somehow?
@hughperkins it is in the doc : https://github.com/Element-Research/rnn/blob/master/README.md#use-backwardonline . I added that header to make it more obvious to the reader.
Thanks for writing the library!
What's the best way to make multi-layer LSTMs without using the Sequencer? (I am finding Sequencer() to be slightly too memory intensive for my dataset so I want to do the decoder side manually, one step at a time)
This is all fine, but calling BPTT via
results in the following error: