baggepinnen / FluxOptTools.jl

Use Optim to train Flux models and visualize loss landscapes
MIT License
59 stars 4 forks source link

Questions FluxOptTools usability #14

Closed bienpierre closed 2 years ago

bienpierre commented 2 years ago

Awesome work on FluxOptTools!

I have two main questions about usability of the package:

1) As Zygote.hessian constructs the hessian. Is it possible to calculate the Hessian matrix for training the model with second order methods from Optim.jl? Do you have idea, what would be the benefit of using second order methods rather than first order methods? 2) Does the MLJFlux can be used to train the networks with Optim.jl with FluxOptTools.jl?

Warmest regards.

baggepinnen commented 2 years ago

Hello and thank you :) I'm not sure how easy it would be to add Hessian functionality to this package, I have not been using it myself in many years. Second-order methods have faster convergence near optima, but are unlikely to be much better if you're starting far away from a good optimum. The hessian is also very expensive to compute and store for large models like neural networks, but much more feasible for smaller models with fewer parameters. You could try LBFGS from Optim which is a first-order method that internally approximates the Hessian using only first-order information.

I know nothing about MLJFlux unfortunately.

/Fredrik

bienpierre commented 2 years ago

Thanks for answering. I see about second order methods...

Concerning MLJFlux, I had a look on main function that train neural network ( fit!(loss, penalty, chain, optimiser, epochs, verbosity, X, y) ) and I added a method.

Regards