d2l-ai / d2l-en

Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
https://D2L.ai
Other
23.36k stars 4.29k forks source link

Improve evaluation results for Batch norm #2108

Closed astonzhang closed 1 year ago

astonzhang commented 2 years ago

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/batch-norm.html

The evaluation results of batch norm for all frameworks need improvements. For example, as the text states, the performance should be better than that of LeNet without BN. However, the training looks unstable as shown in the plots, such as in the plot of val_acc.

Screen Shot 2022-04-20 at 6 06 31 PM

Screen Shot 2022-04-20 at 6 06 47 PM

astonzhang commented 2 years ago

@AnirudhDagar feel free to share your insight here while you're working on https://github.com/d2l-ai/d2l-en/issues/2107 and https://github.com/d2l-ai/d2l-en/issues/2099

cheungdaven commented 2 years ago

@astonzhang I found that removing the batchnorm on the fully-connected layers can make the curve stable.

astonzhang commented 2 years ago

@cheungdaven In fact, it seems that ResNet and DenseNet have similar issues:

http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/resnet.html http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/densenet.html

although RegNet in PyTorch has smoother plots: http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/cnn-design.html

Since ResNet and DenseNet do not apply BN on the FC layers in network heads, perhaps the plot issue is with somewhere else? Are you able to find any literature that supports removal of BN after FC layers? If not, could you try something else?

AnirudhDagar commented 1 year ago

Closing this since it was fixed with the latest version of all the frameworks.