ISCAS007 / PaperReading

Computer Vision Paper Reading for ISCAS
GNU General Public License v3.0
7 stars 6 forks source link

Practical Advice for Deep Learning #10

Open yzbx opened 5 years ago

yzbx commented 5 years ago

Traing

yzbx commented 5 years ago

batch normalization

experiment paper

Based on our experiments, we can establish that the initial learning rate is in fact the most important hyperparameter in neural network training. We agree with the recommendation provided in [2] that one should pick the largest possible learning rate that does not cause the model to diverge. Batch normalization enables us to sample possible values for the initial learning rate from a larger distribution. After the initial learning rate is chosen, the next crucial hyperparameter is the learning rate decay. In our experiments, we found that adaptively decaying the learning rate based on the validation accuracy measured after each epoch performs strictly better than exponential or power decay. Naturally, one can find optimal parameters for power and exponential decay with cross validation, but decaying the learning rate based on the validation accuracy is an intuitive heuristic that works very well in practice. Regarding weight initialization, we recommend the use of variance preserving initialization schemes such as the ones discussed in chapter 2, whether batch normalization is used or not. Specifically we recommend using Kaiming initialization for rectified activations such as ReLU and ELU. Although saturating nonlinearities are not recommended, one should favor Xavier initialization for sigmoids and hyperbolic tangents.