Closed jzilly closed 8 years ago
Are you sure? See: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2013_sutskever13.pdf or http://publications.idiap.ch/downloads/reports/1995/95-04.pdf
The way to understand it is that we'd normally have θ -> θ - αg. Instead, we want to do θ -> θ -αg + γv i.e. add a velocity vector. What we add at this step in total is the effective velocity (needed for next step), which is - αg + γv.
Thank you Rupesh for the reply.
Looking closer at it I suppose the two ways are equivalent.
See you tomorrow.
Sent from my iPhone
On 16.11.2015, at 20:41, Rupesh Kumar Srivastava notifications@github.com wrote:
Are you sure? See: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2013_sutskever13.pdf or http://publications.idiap.ch/downloads/reports/1995/95-04.pdf
The way to understand it is that we'd normally have θ -> θ - αg. Instead, we want to do θ -> θ -αg + γv i.e. add a velocity vector. What we add at this step in total is the effective velocity (needed for next step), which is - αg + γv.
— Reply to this email directly or view it on GitHub.
The momentum stepper should implement the following equations: v=gamma_v + alpha * g (g is gradient) theta = theta - v Instead the implementation does: v=gamma_v - alpha * g theta = theta + v.
In essence there is a sign error. Will submit a pull request soon.