[ Frank Discussion 1 ] Training Strategy

Welcome to discuss the training strategy here.

There are two typical training strategies, "SGD + Reduce Learning Rate on Plateau" and "Adam + Warm Restarts".

SGD + Reduce Learning Rate on Plateau

(1) Training slowly but could make a good generalization. (2) The parameters of ReduceLROnPlateau should be set carefully, such as patience and learning rate scale. ......

Adam + Warm Restarts

(1) It is not clear to set the T for Warm Restarts. (2) It is dizzy to make sure how many times the Restarts should be. ......

In fact, I am still not sure how the value of weight decay influences the results when training with these two strategies. And are there any other factors decide the final performance when comparing the two strategies?

Welcome to comment and share your experiments.

Snowdar / asv-subtools

[ Frank Discussion 1 ] Training Strategy #3

SGD + Reduce Learning Rate on Plateau

Adam + Warm Restarts