JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

Introduce `default_stepsize(M, O)` #180

Closed mateuszbaran closed 1 year ago

mateuszbaran commented 1 year ago

The following code demonstrates how conjugate gradient descent fails on Rosenbrock function using default parameters:

using Manopt
using Manifolds

f_manopt(::Euclidean, x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

function g_manopt!(::Euclidean, storage, x)
    storage[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    storage[2] = 200.0 * (x[2] - x[1]^2)
    return storage
end

M = Euclidean(2)
cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), return_options=true)

Could we have default parameters that make it work?

kellertuer commented 1 year ago

If you find default parameters that do work – sure; I am not 100% sure where the current default is from, I think those are the ones from Manopt/Matlab.

In general setting good defaults is a really complicated thing to do – and remember that Rosenbrock is a really mean example. But - hehe - this one really explodes fast

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), debug=[:Iteration, :Iterate, :Cost, :Stop,"\n"])
Initial F(x): 1.000000
# 1     x: [2.0, 0.0]F(x): 1601.000000
# 2     x: [-3200.0, 800.0]F(x): 10484121674246400.000000
# 3     x: [2.6212863989790594e13, -3.2725775149997744e12]F(x): 47212597681438596372495799766368366330997806787448537088.000000
# 4     x: [-1.4408985664390341e43, 8.994538430325952e41]F(x): 4310559429836373893762014461052062964684649679539072155822222971503375833074684002488557111981322457934557086002722283200061060668117239830893990062402996910678581744400072704.000000
# 5     x: [2.393261832712779e132, -7.46974354390761e130]F(x): Inf
# 6     x: [NaN, NaN]F(x): NaN
kellertuer commented 1 year ago

Ah, I see that is the same problem as with gradient descent. We have a constant stepsize as default for both

For example with the default. Armijo

julia> cg_opts = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3789170183626525, 0.1455948665234621]F(x): 0.386151
# 200   x: [0.4604486280239601, 0.21308291361774606]F(x): 0.291230
# 300   x: [0.5157252480309722, 0.26664128363827033]F(x): 0.234567
# 400   x: [0.5595346370785345, 0.3136420428474512]F(x): 0.194041
# 500   x: [0.5940718838053803, 0.3531472767461153]F(x): 0.164783
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.5940718838053803
 0.3531472767461153

it already looks better. The question here is – whom do we want to annoy with our default?

I think we could switch to the first case, since we now have the default_retraction_method(M) – the reason to switch to that was, that the old default was ExponentialRetraction within Armijo (and that did not have a manifold in its arguments.

kellertuer commented 1 year ago

...and of course you can tweak Armijo as well:

x2 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3861691582528581, 0.15159518234304356]F(x): 0.377398
# 200   x: [0.47064890494573247, 0.2227435429442573]F(x): 0.280365
# 300   x: [0.5267486471817312, 0.2782161517108435]F(x): 0.224023
# 400   x: [0.568652360020186, 0.32391203359214754]F(x): 0.186091
# 500   x: [0.602145033676647, 0.3629237886067693]F(x): 0.158300
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.602145033676647
 0.3629237886067693

x3 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.999, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100])
Initial F(x): 1.000000
# 100   x: [0.3116617176333236, 0.16292536758228865]F(x): 0.906673
# 200   x: [0.36268757688393066, 0.19533509991927367]F(x): 0.813120
# 300   x: [0.40246487607807646, 0.22263562719679558]F(x): 0.724983
# 400   x: [0.43777261003280743, 0.2491305562675438]F(x): 0.646560
# 500   x: [0.47052893901998494, 0.27567440680272826]F(x): 0.574938
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.47052893901998494
 0.27567440680272826

f_manopt(M, x2) # is 0.15830048687506507
f_manopt(M, x3) # is 0.1456151297663319

or even switch to a different CG coefficient

x4 = conjugate_gradient_descent(M, f_manopt, g_manopt!, [0.0, 0.0]; evaluation=MutatingEvaluation(), stepsize=ArmijoLinesearch(M; contraction_factor=0.99, sufficient_decrease=0.00001),  debug=[:Iteration, :Iterate, :Cost, :Stop,"\n",100], coefficient=FletcherReevesCoefficient())
Initial F(x): 1.000000
# 100   x: [0.5104355347188466, 0.2533412185265233]F(x): 0.244862
# 200   x: [1.3625537059095383, 1.8580866725590546]F(x): 0.131681
# 300   x: [0.707132629054107, 0.49838856349984095]F(x): 0.086043
# 400   x: [1.1917317530345954, 1.401468104467437]F(x): 0.071942
# 500   x: [0.7745036515003424, 0.6057082658975672]F(x): 0.054274
The algorithm reached its maximal number of iterations (500).
2-element Vector{Float64}:
 0.7745036515003424
 0.6057082658975672

yields

julia> f_manopt(M,x4)
0.0542736146044402

So it is again often the question: Provide good defaults that work “okayish” – or optimise for a specific problem

mateuszbaran commented 1 year ago

I think it's safe to assume that people who would like gradient-based optimization on a manifold do it on a manifold with a reasonable default_retraction_method(M). I wouldn't over-optimize for Rosenbrock but it would be nice to have defaults that make it work sort of OK -- maybe even at the cost of special-casing defaults for R^n. Maybe such defaults would also make sense for other nonpositively-curved manifolds?

kellertuer commented 1 year ago

Sure we can switch to Armijo in Manopt 0.4.

The safest way of course could be to have an “empty” default if none is defined (but that is a change to ManifoldsBase) and choose a constant stepsize then. But this means that when defining exp one has to set the default themselves.

kellertuer commented 1 year ago

Currently I have no way of checking the curvature before setting the default, so If I would switch for all manifolds to Armijo.

mateuszbaran commented 1 year ago

Could we have a get_default_linesearch(::Manifold, ::Type{<:Options}) function in Manopt 0.4?

kellertuer commented 1 year ago

That sounds like a good idea :)

White some things to keep in mind for 0.4, but sure, we can do a few PRs on main before releasing that one. Let me finish the large rework for costgrad first.

edit: Maybe default_linesearch fits better, since the other defaults also do not have a get upfront – and it is a little difficult to define these defaults, since that would have to be done only if manifolds is loaded (and I do not have good defaults for specific manifolds for now).

kellertuer commented 1 year ago

Ah an I would maybe call that default_stepsize since there are quite a few rules that do not search at all ;)

kellertuer commented 1 year ago

I noticed two problems with this idea.

1) If so we would only defined default_stepsize(::AbstractManifold, ::MyConcreteSolverState) within Manopt.jl; so later on for specific manifolds one can only overwrite this for a spefific manifold together with a specific solver (or at least a class of manifolds with a specific solver), which I am not sure is that useful.

2) More problematic is how to use this. My idea was to replace the stepsize=... now with stepsize=default_stepsize(M,...)- but at that point the state does not yet exist it is only generated later in the constructor, so we can not specify the second argument at the point where I wanted to use this.

What should we do about these two points to make is usable and easy to implement?

mateuszbaran commented 1 year ago

Hm, I think this shouldn't dispatch on solver state but on solver type. So we may have something like default_stepsize(::AbstractManifold, ::typeof(quasi_Newton)). Or we can specify solver type using a new type hierarchy which may be a bit nicer but more complicated to implement. Note that Opimization.jl integration also needs a solver type thing: https://github.com/SciML/Optimization.jl/pull/437/files#diff-91c386d7b5fa9310b5c87114b777bc81e69479305f504a2f361228fd1b6fd8af but it needs to collect more information (e.g. stepsize) so it must be separate.

kellertuer commented 1 year ago

Good that I have a hierarchy established for the (former options now) SolverStates, which identify the solver nearly-uniquely. With nearly I mean that the solver functions do depend on Problem and State, but in practice State determines the solver. So we could use that also as the type hierarchy for Optimization.jl

I think the type-idea you posted is not 100% correct, since quasi_Newton is the high level interface function and I would prefer to use the AbstractSolverState type hierarchy at the second argument, but in reality that is also just a minor difference.

mateuszbaran commented 1 year ago

So, your idea is to use something like default_stepsize(::AbstractManifold, ::Type{T}) where T<:AbstractSolverState? That would also work I think.

kellertuer commented 1 year ago

I think that looks also nicer than using the high-level functions like quasi_Newton or DouglasRachfords types - and we actually can identify the solver based on its State type (as I said that was originally not necessarily intended, since we could have different solvers based on the problem as well). So the I will go with that. Since it is getting late (or dark here at least) I might also just start with that today still (it is a little dull work to introduce that at every solver).

mateuszbaran commented 1 year ago

Cool, thanks :slightly_smiling_face: .