Possible explanation to the unstable behavior of SCSCNN in training

MyWorkShop / Convolutional-Neural-Networks-in-Convolution

Strided Convolution of Small CNN

Do What The F*ck You Want To Public License

0 stars 0 forks source link

Possible explanation to the unstable behavior of SCSCNN in training #8

Closed D0048 closed 6 years ago

D0048 commented 6 years ago

@mwsht According to this article, the gradient of SCSCNN model will be reeeeeeeeeally flat and full of saddle points...

Maybe consider a alternate training method?

D0048 commented 6 years ago

This also suggest possible trails with larger learning rate with smaller batch sizes. According to my test, learning rate 0.1 works just fine for converging into an accuracy of 96%--and drops to 80% in one epoch--which doubted to be an overshoot.

Stochastic Gradient Descent may be worth trying.

Further tests pending.