denizyuret / Knet.jl

Koç University deep learning framework.
https://denizyuret.github.io/Knet.jl/latest
Other
1.43k stars 230 forks source link

LBFGS optimizer #590

Open qzhu2017 opened 4 years ago

qzhu2017 commented 4 years ago

Although I know most of people use ADAM or SGD for the optimization of weight in NN. However, for some small NN architecture, the optimization method with line search (e.g., LBFGS) would be far more efficient. I wonder if the Knet developers would be interested in implementing LBFGS?

denizyuret commented 4 years ago

Optim.jl may already have this.

On Thu, Aug 6, 2020 at 9:33 AM Qiang Zhu notifications@github.com wrote:

Although I know most of people use ADAM or SGD for the optimization of weight in NN. However, for some small NN architecture, the optimization method with line search (e.g., LBFGS) would be far more efficient. I wonder if the Knet developers would be interested in implementing LBFGS?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN43JRAGCZDYPYGAQUKDL3R7JFCBANCNFSM4PWH6W5A .

qzhu2017 commented 4 years ago

Yes, I know. Can I using it with Kent? I tried it. But this is not very obvious. To be more specific, optim.jl needs to know the following to optimize a function of f(x).

Both x and g need to be converted to a sort of 1D array. However, the main trouble is g. My student and I tried to transform g to something which can be accepted by optim.jl. But we failed. Not sure if someone has such experience.

denizyuret commented 4 years ago
x = Param(Array(...))
J = @diff f(x)
g = grad(J,x)

Should give you a g with the exact type/shape as x. See @doc AutoGrad.