Add readout_activation param to models

farizrahman4u / seq2seq

Sequence to Sequence Learning with Keras

GNU General Public License v2.0

3.17k stars 845 forks source link

Add readout_activation param to models #199

Closed gibrown closed 7 years ago

gibrown commented 7 years ago

The enables avoiding getting stuck in a NaN loss hole when training. This workaround let's the user fix #189

Example usage:

model = Seq2Seq(input_dim=in_dim, input_length=MAXLENGTH, hidden_dim=HIDDEN_SIZE, output_length=MAXLENGTH, output_dim=out_dim, depth=LAYERS, peek=True, readout_activation='softmax')

I do not fully understand the implications of using softmax as the output activation layer, but in my own project (https://github.com/Automattic/wp-translate) setting the output to softmax using this code does seem to have gotten me past getting stuck with NaN during training.

ChristopherLu commented 7 years ago

Does this change include model.add(TimeDistributed(Dense(output_dim)))?

gibrown commented 7 years ago

@ChristopherLu no, it only let's you change the activation function in the decoder output.

BTW, I have since found that this change did not completely solve my problem. Training worked for longer, but I still eventually ran into NaN losses at some point. I am not sure yet whether this is a problem with the data I am providing the model or if it is some complexity in how the network itself fits together.

ChristopherLu commented 7 years ago

@gibrown Exactly. I finally met the NaN problem when the number of training epochs goes up. I guess it's a gradient vanishing problem, and it also depends on the learning rate you set.

So can we say, the softmax activation can only alleviate the NaN, but not solve the problem essentially?

gibrown commented 7 years ago

@ChristopherLu ya that is my conclusion. Based on this thread and the common problems across applications, I am guessing that it is something inherent in how the functions of the model fit together that for some data you can easily end up in such conditions.

The other possibility is that I have a bug in generating my training data, but I've been looking at that for a while and haven't found it. My next plan (when I get back to this, probably in a few weeks) is to try the Tensorflow seq2seq directly: https://www.tensorflow.org/tutorials/seq2seq

I had tried that method in the past (like a year ago), but been unable to get it to work. I think it has been significantly updated though and that tutorial looks improved. I guess maybe that model could be ported into this lib if that works.

ChristopherLu commented 7 years ago

@gibrown Thx. About to try tf seq2seq as well.

gibrown commented 7 years ago

@ChristopherLu I've reworked my application to use https://github.com/google/seq2seq and that seems to be working well so far.

ChristopherLu commented 7 years ago

@gibrown Thx for the recommendation, I will have a try.

cpury commented 7 years ago

This did not make my training converge :(

gibrown commented 7 years ago

Ya I don't think this is a real solution to the original problem, I'm just going to close this PR.