Open dbl001 opened 6 years ago
My initial thought is that a big problem with these LSTM/RNN models is combatting the issue of overfitting to the training data, and judging from your training curves, I think it's safe to say that the network definitely has learned the training data, but it might not be able to generalize to newer examples, and thus the reasoning for the fluctuating test accuracy.
Since this tutorial was just to mainly get people exposed to NLP tasks and using LSTMs/RNNs in Tensorflow, I didn't include these in the code, but what I think would be helpful is thinking about adding some types of regularization, thinking about using just RNNs (since the LSTMs might just be contributing to the overfitting problem), using early stopping, splitting your data into train/test/validation instead of just train/test so that you can see where the validation accuracy drops off, etc etc.
Hope this helps!
Do you see anything wrong with my simple test (below)?
No, like I don't think there's anything wrong with the code itself, I'm just saying that I think the network has overfit to the training data, and thus can't answer future queries with the best accuracy. So the fix to that is tuning hyperparameters, adding regularization, the things I mentioned in the post above, etc.
"With four parameters I can fit an elephant and with five I can make him wiggle his trunk." -John von Neumann, cited by Enrico Fermi in Nature 427
On Dec 14, 2017, at 9:38 AM, Adit Deshpande notifications@github.com wrote:
No, like I don't think there's anything wrong with the code itself, I'm just saying that I think the network has overfit to the training data, and thus can't answer future queries with the best accuracy. So the fix to that is tuning hyperparameters, adding regularization, the things I mentioned in the post above, etc.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adeshpande3/LSTM-Sentiment-Analysis/issues/15#issuecomment-351783124, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i24cJriPw8xCMSG0_SX66cbn6JPZOks5tAV0VgaJpZM4RBAgU.
What about ‘not enough training examples’?
On Dec 14, 2017, at 9:38 AM, Adit Deshpande notifications@github.com wrote:
No, like I don't think there's anything wrong with the code itself, I'm just saying that I think the network has overfit to the training data, and thus can't answer future queries with the best accuracy. So the fix to that is tuning hyperparameters, adding regularization, the things I mentioned in the post above, etc.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adeshpande3/LSTM-Sentiment-Analysis/issues/15#issuecomment-351783124, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i24cJriPw8xCMSG0_SX66cbn6JPZOks5tAV0VgaJpZM4RBAgU.
I think ML and DL especially is notorious for having a very long list of reasons for why a model might not work effectively, and yeah I think amount of training examples is definitely on that list. In regards to that particular quote, I think it definitely has to be taken in context of your problem space, so I don't think 100,000 should be a hard and fast rule or anything.
I have a ‘word-sense disambiguation’ question:
Word2vec presumably captures all the word senses in it’s encoding, however, each sense would change the values/distribution of factors. Does the LSTM which scans each word, does a work lookup and factor in the appropriate words in the context of the surrounding words, help to pinpoint the words sense? E.g. - This might help in sentences with sarcasm.
On Dec 14, 2017, at 11:14 AM, Adit Deshpande notifications@github.com wrote:
I think ML and DL especially is notorious for having a very long list of reasons for why a model might not work effectively, and yeah I think amount of training examples is definitely on that list. In regards to that particular quote, I think it definitely has to be taken in context of your problem space, so I don't think 100,000 should be a hard and fast rule or anything.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adeshpande3/LSTM-Sentiment-Analysis/issues/15#issuecomment-351807713, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i24ESAd7LYL0pSgeoKKCeNiEN4bLyks5tAXNlgaJpZM4RBAgU.
Hmm it's tough to tell if the LSTM units would pick up on that, I think the more likely case is that when the word vectors are getting generated from Word2Vec, it will inevitably see a lot of examples where the word (such as flies) is used in the bug context as well as in the verb context, given that you're training on a large enough corpus as well. In a way, I think Word2Vec kinda "averages" the effect from seeing the word in both contexts. Check this thread for more thoughts on that https://www.reddit.com/r/LanguageTechnology/comments/3jerqt/distinguishing_different_meanings_of_a_word/
https://arxiv.org/pdf/1511.06388.pdf
On Dec 14, 2017, at 11:37 AM, Adit Deshpande notifications@github.com wrote:
Hmm it's tough to tell if the LSTM units would pick up on that, I think the more likely case is that when the word vectors are getting generated from Word2Vec, it will inevitably see a lot of examples where the word (such as flies) is used in the bug context as well as in the verb context, given that you're training on a large enough corpus as well. In a way, I think Word2Vec kinda "averages" the effect from seeing the word in both contexts. Check this thread for more thoughts on that https://www.reddit.com/r/LanguageTechnology/comments/3jerqt/distinguishing_different_meanings_of_a_word/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
I tried early stopping after 30,000 and also 50,000 - not much improvement. I tried adjusting the dropout from .75 to .5 - not much improvement. Next up, regularization and replacing LSTM with a RNN).
On Dec 14, 2017, at 9:22 AM, Adit Deshpande notifications@github.com wrote:
My initial thought is that a big problem with these LSTM/RNN models is combatting the issue of overfitting to the training data, and judging from your training curves, I think it's safe to say that the network definitely has learned the training data, but it might not be able to generalize to newer examples, and thus the reasoning for the fluctuating test accuracy.
Since this tutorial was just to mainly get people exposed to NLP tasks and using LSTMs/RNNs in Tensorflow, I didn't include these in the code, but what I think would be helpful is thinking about adding some types of regularization, thinking about using just RNNs (since the LSTMs might just be contributing to the overfitting problem), using early stopping, splitting your data into train/test/validation instead of just train/test so that you can see where the validation accuracy drops off, etc etc.
Hope this helps!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/adeshpande3/LSTM-Sentiment-Analysis/issues/15#issuecomment-351778875, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i27hRFsA04SsHzTrA_JOQvSOL51XLks5tAVlZgaJpZM4RBAgU.
LSTM with regularization is the best so far. Have you seen better numbers?
LSTM w/regularization, dropout=0.50, 100,000 iterations Accuracy for this batch: 87.5 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 66.6666686535 Accuracy for this batch: 75.0 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 83.3333313465
RNN w/regularization, dropout=0.5, 100,000 iterations Accuracy for this batch: 50.0 Accuracy for this batch: 54.1666686535 Accuracy for this batch: 50.0 Accuracy for this batch: 62.5 Accuracy for this batch: 66.6666686535 Accuracy for this batch: 62.5 Accuracy for this batch: 41.6666656733 Accuracy for this batch: 50.0 Accuracy for this batch: 45.8333343267 Accuracy for this batch: 54.1666686535
RNN no regularization, dropout=0.50, 100,000 iterations Accuracy for this batch: 70.8333313465 Accuracy for this batch: 50.0 Accuracy for this batch: 58.3333313465 Accuracy for this batch: 41.6666656733 Accuracy for this batch: 54.1666686535 Accuracy for this batch: 66.6666686535 Accuracy for this batch: 66.6666686535 Accuracy for this batch: 54.1666686535 Accuracy for this batch: 54.1666686535 Accuracy for this batch: 58.3333313465
On Dec 21, 2017, at 2:07 PM, David Laxer davidl@softintel.com wrote:
adjusting
@dbl001 can you share your code ?
Hyper-parameters:
Stopped early: 70,000 iterations Drop-out: output_keep_prob=0.75 Regularization: regularizer = tf.contrib.layers.l2_regularizer(scale=0.1) reg_constant = 0.01
Accuracy for this batch: 70.8333313465 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 87.5 Accuracy for this batch: 87.5 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 95.8333313465 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 75.0 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 62.5
What do you think?
On Jan 3, 2018, at 4:55 AM, ANIL notifications@github.com wrote:
@dbl001 https://github.com/dbl001 can you share your code ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/adeshpande3/LSTM-Sentiment-Analysis/issues/15#issuecomment-355005559, or mute the thread https://github.com/notifications/unsubscribe-auth/AC9i2zgXyhTxqbnXHi3zUYZu9lgd1XKYks5tG3jPgaJpZM4RBAgU.
I'm running on Tensorflow version: 1.4.0 Anaconda Python 3.6 OS X 10.11.6 No GPU I trained the models in my own environment:
iterations = 10 for i in range(iterations): nextBatch, nextBatchLabels = getTestBatch(); print("Accuracy for this batch:", (sess.run(accuracy, {input_data: nextBatch, labels: nextBatchLabels})) * 100)
Accuracy for this batch: 87.5 Accuracy for this batch: 75.0 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 95.8333313465 Accuracy for this batch: 83.3333313465 Accuracy for this batch: 91.6666686535 Accuracy for this batch: 91.6666686535 Accuracy for this batch: 79.1666686535 Accuracy for this batch: 87.5 Accuracy for this batch: 79.1666686535
Any ideas why the accuracy varies so much for each batch? I tried running against the pre-trained model, but tensorflow 1.4.0 can't process the file. Here's my tensorboard output: