Open roaminggypsy opened 6 years ago
Yeah it definitely looks like over fitting. Something that I didn't do in the tutorial is create a separate testing set of examples that the network does not get trained on. Once you create that set, you can test the network every x number of iterations to see the predictions on those testing examples. Once you see that the testing error stops decreasing then that is a good place to stop.
Several people have also told me that using a complex LSTM model for a binary task like sentiment analysis is probably overkill and that simpler models would likely be a better approach
Thanks for your reply!
I reduced iterations
from 100000 to 50000, and it seems good:
Accuracy for this batch: 81.99999928474426
Accuracy for this batch: 82.99999833106995
Accuracy for this batch: 81.99999928474426
Accuracy for this batch: 83.99999737739563
Accuracy for this batch: 81.00000023841858
Accuracy for this batch: 83.99999737739563
Accuracy for this batch: 80.0000011920929
Accuracy for this batch: 79.00000214576721
Accuracy for this batch: 83.99999737739563
Accuracy for this batch: 91.00000262260437
😂I might try to implement early-stopping when I become more familiar with Python and DL.
Also, I think there might be a mistake 🧐
numDimensions = 300 #Dimensions for each word
numDimensions should be 50 (Because we use Glove instead of Word2Vec)
First, thanks for your tutorial. I'm new to NLP and DL. You explained things well.
Here is my problem: Using the provided code, I trained the network (didn't use pre-trained model). The model’s accuracy and loss curves during training can be found below. It seems that the model learned well.
But the model doesn't perform well on the testing data: Accuracy for this batch: 33.33333432674408 Accuracy for this batch: 58.33333134651184 Accuracy for this batch: 37.5 Accuracy for this batch: 54.16666865348816 Accuracy for this batch: 50.0 Accuracy for this batch: 58.33333134651184 Accuracy for this batch: 45.83333432674408 Accuracy for this batch: 50.0 Accuracy for this batch: 37.5 Accuracy for this batch: 62.5
Is this overfitting? How to improve this? I read some issues, and it seems that there are many things to do: early stopping / tuning hyper-parameters / Removing LSTM ... Which one should I try first?