Closed NishantTharani closed 4 years ago
Hi @NishantTharani, I was able to verify the issue and reproduce your results. The reason for the sudden drops is because the training loss becomes NaN after a few epochs. You can fix this easily by using a smaller learning rate, which will keep this in control. A good default for SGD is lr=0.01 or 0.05. Please try reporting back your results with these. :)
@AnirudhDagar
1141,
sum
may cause the result wrong.
This result is obtained after one additional layer without re-tuning hyperparameters. How could sum
cause this result? Could you please be clearer? Thanks.
@StevenJokes see my comments in https://github.com/d2l-ai/d2l-en/pull/1176
I and @goldmermaid discussed about this issue and she suggested it is probably due to the loss becoming NaN. Later, I verified that with a reduced learning rate as can be seen in my last comment. This probably hints at exploding gradients issue. What do you think @astonzhang? I don't understand what Steven is suggesting.
@AnirudhDagar Thanks for checking. When modifying architectures, hyperparameters (e.g., lr) may need to re-tuned.
@astonzhang Should we update the code in the chapter to use lr=0.05
instead of lr=0.5
? Or we leave this to be figured out by the readers?
@astonzhang Should we update the code in the chapter to use
lr=0.05
instead oflr=0.5
? Or we leave this to be figured out by the readers?
When you changed it to 0.05, what acc did you get?
@AnirudhDagar , nvm, i just tested it and modified it to be 0.1
Hi @AnirudhDagar sorry for the very late reply and thank you for investigating for me, I tried to reproduce it but for some reason couldn't step through to a point where the training loss became NaN.
In any case I tried changing it to 0.05 and the problem went away so I guess that's it
Following along with the concise implementation of multiplayer perceptrons I then tried to train a neural network with one extra hidden layer, for 50 epochs instead of 10. The resulting plot of training loss/train acc/test acc exhibits sudden drops:
It does not always look like this - sometimes there are no drops and sometimes there is a drop and then a recovery followed by another drop, etc:
Comments by @AnirudhDagar on a forum post I made about this indicate that it could be an issue related to the plot function.
Here is a Jupyter notebook containing the code I ran: https://github.com/NishantTharani/GitSharing/blob/master/concise_multilayer_perceptrons.ipynb