IssamLaradji / sps

Official code for the Stochastic Polyak step-size optimizer
136 stars 22 forks source link

Query: Differnce vs Adadelta? #3

Closed abduallahmohamed closed 2 years ago

abduallahmohamed commented 3 years ago

Hi, Thanks for your work. I was wondering about the difference between your method and Adadelta?

IssamLaradji commented 3 years ago

You welcome!

The difference is that Adadelta belongs to the Adagrad family and as such accumulates past gradients to get the step size, while SPS only uses the gradient norm of only the current iteration. Further, it looks like that AdaDelta has slow convergence [1] whereas SPS has a fast convergence rate when interpolation is satisfied.

[1] http://akyrillidis.github.io/notes/AdaDelta