Closed D0048 closed 6 years ago
This also suggest possible trails with larger learning rate with smaller batch sizes. According to my test, learning rate 0.1
works just fine for converging into an accuracy of 96%--and drops to 80% in one epoch--which doubted to be an overshoot.
Stochastic Gradient Descent may be worth trying.
Further tests pending.
@mwsht According to this article, the gradient of SCSCNN model will be reeeeeeeeeally flat and full of saddle points...
Maybe consider a alternate training method?