Closed boknilev closed 9 years ago
I love these corner cases. Looking into it.
fixed in https://github.com/Element-Research/rnn/commit/e1f0c5049b8c41ad952d7b252b74035987b4a02b . Thank you for the detailed bug report!
Thank you for the quick solution!
Hi,
I notice a strange problem when using the new version. Before the update of the new remember/forget mechanism, I was training a model with LSTMs (in a sequencer) and got good training behaviour: training error was continuously decreasing, validation error was decreasing for a while, then converging. After the update, I notice the training error stops decreasing after a couple of epochs and starts increasing.
I know this is a very high-level description, but do you have any idea what might have changed? I suspect there is a problem with the gradients, although all the tests pass.
Thanks for your help.
Let me start an LSTM experiment on PennTreeBank to see how it does.
Seems to work on my end. Do you have particular use case you want me to test? Like with remember('eval') with and LSTM?
Hi,
It's difficult to debug and narrow down the problem to a simple example, but I'll try. The general symptom is that before the change I was seeing good convergence and decreasing training errors, and after the change I see the training error first decreasing, then increasing continuously. I'll see if I can recreate the problem in a simple example.
Thanks boknilev.
Nicholas Léonard 917-741-7570
On Wed, Aug 5, 2015 at 4:57 PM, boknilev notifications@github.com wrote:
Hi,
It's difficult to debug and narrow down the problem to a simple example, but I'll try. The general symptom is that before the change I was seeing good convergence and decreasing training errors, and after the change I see the training error first decreasing, then increasing continuously. I'll see if I can recreate the problem in a simple example.
— Reply to this email directly or view it on GitHub https://github.com/Element-Research/rnn/issues/19#issuecomment-128146365 .
Hi,
I still can't track down the source of the problem. I did notice the following behaviour: After the code update, my gradients are much larger than before (an order of magnitude larger). I believe the problem may be with some nn code update and not with rnn, because I tried reverting to a previous rnn version (by downloading the .lua files from the repo) and still had the problem.
Do you have any suggestion what might cause this behaviour? Do you know how to revert back to a previous state of the nn package? I can't simply download .lua files from a previous repo version, because it also needs compilation, which I only know to do with "luarocks install nn",
Thanks for your help.
git clone git@github.com:torch/nn.git
cd nn
git checkout [commit hash]
luarocks make rocks/[tab]
Maybe you should try a smaller learning rate. What does your model look like?
Hi,
Thanks. I'll try that.
Yes, a smaller learning rate and larger dropout rate helps a bit. I still don't get as good performance as I had prior to the code update though.
It's an LSTM autoencoder for sentences, a la sequence-to-sequence learning. Model is roughly an LSTM in the encoder and an LSTM in the decoder, with dropout layers, and Softmax on the decoded words. Interface of encoder-decoder is currently by using the final output of the encoder as the first input to the decoder, although I'm aware of issue https://github.com/Element-Research/rnn/issues/16.
You really have to find an older version where your code was working. You could also try older versions of the rnn package if trying older versions of nn doesn't work.
Closing for now.
Hi,
Recent changes to the Sequencer remember/forget mechanism introduced modes like "both" and "eval", which is very convenient. However, in "eval" mode, a forward step during evaluation will set the maximum number of BPTT steps (rho value) to the size of the input. Then, a subsequent epoch of training on a sequence of different size will fail in the backward step. Before the change, remember() worked fine.
The reason is probably the setting of rho in the recurrent module (in this case LSTM), which then causes the backward step during training to stop before reaching the beginning of the sequence. See LSTM:updateGradInputThroughTime().
Note: I know that the README says it is recommended to set mode="both" for LSTM, but I prefer the "eval" mode because each training example is independent. In any case, I suppose both modes should be possible for any AbstractRecurrent instance.
A minimal working example with LSTMs:
Could you look into that?
Many thanks for your help.