Performance benchmarking (fit)

Before starting this, need a way to systematically:

trace number of function calls, number of gradient calls, number of hessian calls
have a way to stop with universal criterion OR show a plot where the objective function decreases and eventually hits the same value as that from ref package.

expect on par or better

[ ] ridge (in big case should see improvements from using CG)
- [ ] analytical (should see no real diff)
- [ ] CG
[ ] lasso
- [ ] FISTA
- [ ] ISTA
[ ] elnet
[ ] logistic (no or l2 penalty)
- (cannot test Newton)
- [ ] Newton CG see issue #24
- [ ] LBFGS
[ ] logistic (elnet penalty)
- [ ] FISTA
- [ ] ISTA
[ ] multinomial (no or l2 penalty)
- [ ] Newton CG see issue #24
- [ ] LBFGS
[ ] multinomial (elnet penalty)
- [ ] FISTA
- [ ] ISTA
  Against quantreg

expect a bit worse (quantreg is effectively in cpp)

JuliaAI / MLJLinearModels.jl