Before starting this, need a way to systematically:
trace number of function calls, number of gradient calls, number of hessian calls
have a way to stop with universal criterion OR show a plot where the objective function decreases and eventually hits the same value as that from ref package.
Against scikitlearn
expect on par or better
[ ] ridge (in big case should see improvements from using CG)
[ ] analytical (should see no real diff)
[ ] CG
[ ] lasso
[ ] FISTA
[ ] ISTA
[ ] elnet
[ ] logistic (no or l2 penalty)
(cannot test Newton)
[ ] Newton CG see issue #24
[ ] LBFGS
[ ] logistic (elnet penalty)
[ ] FISTA
[ ] ISTA
[ ] multinomial (no or l2 penalty)
[ ] Newton CG see issue #24
[ ] LBFGS
[ ] multinomial (elnet penalty)
[ ] FISTA
[ ] ISTA
Against quantreg
expect a bit worse (quantreg is effectively in cpp)
Before starting this, need a way to systematically:
Against scikitlearn
expect on par or better
Against quantreg
expect a bit worse (quantreg is effectively in cpp)