Closed astonzhang closed 1 year ago
@AnirudhDagar feel free to share your insight here while you're working on https://github.com/d2l-ai/d2l-en/issues/2107 and https://github.com/d2l-ai/d2l-en/issues/2099
@astonzhang I found that removing the batchnorm on the fully-connected layers can make the curve stable.
@cheungdaven In fact, it seems that ResNet and DenseNet have similar issues:
http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/resnet.html http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/densenet.html
although RegNet in PyTorch has smoother plots: http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/cnn-design.html
Since ResNet and DenseNet do not apply BN on the FC layers in network heads, perhaps the plot issue is with somewhere else? Are you able to find any literature that supports removal of BN after FC layers? If not, could you try something else?
Closing this since it was fixed with the latest version of all the frameworks.
http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_convolutional-modern/batch-norm.html
The evaluation results of batch norm for all frameworks need improvements. For example, as the text states, the performance should be better than that of LeNet without BN. However, the training looks unstable as shown in the plots, such as in the plot of val_acc.