Closed howardyclo closed 6 years ago
Did you try tuning hyperparameters? Like initial weight, optim and learning rate, those used in GRU models in other work may help.
@helson73
Ok, I found the problem.
I re-run the experiment by modifying the -max_grad_norm
back to the default value (5).
And the GRU performance becomes fine. (Can decrease the training loss).
But don't know why the -max_grad_norm
could effect so much on GRU unit.
(LSTM is fine: In my experiment, I tuned the -max_grad_norm
from 5 to 2 and used the LSTM unit, the performance becomes slightly better.)
Its just my experience but seems like GRUs are often more sensitive to hyperparameters, for instance, vanilla SGD actually doesn't work for GRUs in many cases, but fine to LSTMs.
@helson73 Thanks for the observation. The reason why I choose to try GRU unit here is because there're several grammatical error correction papers use GRU-based seq2seq. But it seems that GRU needs more tuning. Um... I'll go for LSTM (lol)
They use GRU because its faster and simple, use hyperparameters they mentioned should work.
This is an interesting discussion. Our default hyperaparameters are for LSTM, let's add a note suggesting some changes for GRU.
Hi, sorry but may I know whether you wrote the Validation GLEU avg score, please?
Hello, my task is grammatical error correction, where the source and target sentence is the same language. Here is the example: The source is the erroneous tokenized sentence, e.x. "I loves opennmt ." The target is the correct tokenized sentence, e.x. "I love opennmt ."
I've tested several models using LSTM-based encoder/decoder, and the performance is fine (The training accuracy can be achieved to 90% and the losses are generally low).
But, when I use GRU-based encoder/decoder, it seems that the model could not fit the data, which is very weird... I supposed that there might be some bug in GRU?
Here is my script:
(Here I only modify the
-rnn_type
fromLSTM
toGRU
, and the performance became a lot worse.)And here is my training log:
The above log shows that the accuracy is getting lower and the ppl is getting higher...
And this is part of my model's output for validation data (the output seems to be fine):