fmeirinhos / pytorch-hessianfree

PyTorch implementation of Hessian Free optimisation

MIT License

43 stars 8 forks source link

conjugate-gradient hessian-free optimizer pytorch

readme

pytorch-hessianfree

PyTorch implementation of Hessian Free optimisation

Implemented some parts of Training Deep and Recurrent Networks with Hessian-Free Optimization by Martens and Sutskever (2012):

Preconditioner for CG, includes empirical Fisher diagonal (Section 20.11)
Gauss-Newton matrix and Hessian matrix (Section 20.5 & 20.6)
Martens' CG stopping criteria (Section 20.4)
CG backtracking (Section 20.8.7)
Tikhonov damping with Levenberg-Marquardt like heuristic (Section 20.8.1 & 20.8.5)
Line-searching (Section 20.8.5)
Different batches for calculating curvature and gradient, via callable vector b (A x = b) (Section 20.12)

Still yet to do:

Scale-Sensitive damping (Section 20.8.3)

Not fully tested, use with caution!