Sudden drops in plots of loss/accuracy with extended training of a neural network

d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.

https://D2L.ai

Other

23.92k stars 4.36k forks source link

Sudden drops in plots of loss/accuracy with extended training of a neural network #1123

Closed NishantTharani closed 4 years ago

NishantTharani commented 4 years ago

Following along with the concise implementation of multiplayer perceptrons I then tried to train a neural network with one extra hidden layer, for 50 epochs instead of 10. The resulting plot of training loss/train acc/test acc exhibits sudden drops:

It does not always look like this - sometimes there are no drops and sometimes there is a drop and then a recovery followed by another drop, etc:

Comments by @AnirudhDagar on a forum post I made about this indicate that it could be an issue related to the plot function.

Here is a Jupyter notebook containing the code I ran: https://github.com/NishantTharani/GitSharing/blob/master/concise_multilayer_perceptrons.ipynb

AnirudhDagar commented 4 years ago

Hi @NishantTharani, I was able to verify the issue and reproduce your results. The reason for the sudden drops is because the training loss becomes NaN after a few epochs. You can fix this easily by using a smaller learning rate, which will keep this in control. A good default for SGD is lr=0.01 or 0.05. Please try reporting back your results with these. :)