SciML / Optimization.jl

Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.
https://docs.sciml.ai/Optimization/stable/
MIT License
720 stars 79 forks source link

Add Manopt.jl wrapper #712

Closed Vaibhavdixit02 closed 5 months ago

Vaibhavdixit02 commented 7 months ago

Aimed at moving the already substantial work done in #437 forward

Checklist

Additional context

Add any other context about the problem here.

Vaibhavdixit02 commented 7 months ago

In addition to the existing implementation picked from the PR, it is desired to support an interface to specify the Manifold as a constraint or metadata of the optimization variable especially from the MTK interface.

Vaibhavdixit02 commented 7 months ago

But maybe the solver approach here makes more sense, need to think more about this

kellertuer commented 7 months ago

Hi, thanks for taking over (and sorry for being so inactive on the other one). I can try to comment a bit more on the current state here, but just one comment on the constraint comment.

I prefer to think of the manifold as a domain, not a constraint. The reason is, that the algorithms in Manopt work intrinsically, without having any embedding (where the manifold would be the constraint) in mind: They require an inner product on the tangent spaces, a vector transport (sometimes) and a retraction (maybe its inverse as well) but not a projection (for example for SPD matrices, such a projection does not even exist, since that set is open in its embedding).

The JuMP extension Benôit was so nice to program, does this in a nice way I think, see https://manoptjl.org/stable/extensions/#JuMP.jl

They use @variable(model, U[1:2, 1:2] in Stiefel(2, 2), start = 1.0) so specify that the variable p in M specifies the domain of p to be the manifold M.

mateuszbaran commented 7 months ago

But maybe the solver approach here makes more sense, need to think more about this

In theory, manifold is clearly a part of the problem definition, not a solver. As Ronny wrote, it may be even considered a domain and not a constraint, similarly to how boolean and integer parameters are usually not handled as constrains on real values but a thing on its own. In those cases there is overwhelmingly clear evidence that this is the way to go but IMO in the case of manifolds it's still an open problem.

One interesting benefit of treating them like domains is that manifold optimizers like Manopt.jl minimize the need to flatten the structure of variables and can naturally support tangents of different types than optimized variable. I'm not sure how well known that is but "you don't have to flatten" would be my best argument at this time for considering manifolds a domain.

In practice, if you only ever use manifolds with Manopt.jl, it would likely be easier to make them a part of solver.

Vaibhavdixit02 commented 7 months ago

With the symbolic interface (through MTK) we should be able to support something similar to the domain definition, to me that seems most intuitive as well. But I can see that be a bit cumbersome when you have to define multiple variables and since all of them have to usually belong to the same manifold this solver approach does turn out to be easier from both the user and development point of view, so I'd like to support both

kellertuer commented 7 months ago

For multiple variables we have the power manifold (even with the shortcut M^3 for manufiold-valued vectors or M^(3,4) for a metric of variables on a manifold. But also Product manifolds are available (using something like M×N so overloading the × operator). This should make multiple variables easier – and already exists.

kellertuer commented 6 months ago

The newest test already looks quite nice, just that philosophically I would prefer to have the manifold in the objective not the solver.

Vaibhavdixit02 commented 6 months ago

I completely agree @kellertuer but I can't think of a solution to this right now. In a functional interface it is pretty hard to find a membership syntax. If you think about it this is pretty similar to how the Manopt.jl's interface looks as well, so it shouldn't be very unintuitive.

M = SymmetricPositiveDefinite(5)
    m = 100
    σ = 0.005
    q = Matrix{Float64}(I, 5, 5) .+ 2.0
    data2 = [exp(M, q, σ * rand(M; vector_at=q)) for i in 1:m];

    f(M, x, p = nothing) = sum(distance(M, x, data2[i])^2 for i in 1:m)
    f(x, p = nothing) = sum(distance(M, x, data2[i])^2 for i in 1:m)

    optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
    prob = OptimizationProblem(optf, data2[1])

    opt = OptimizationManopt.GradientDescentOptimizer(M)
    @time sol = Optimization.solve(prob, opt)

vs

n = 100
σ = π / 8
M = Sphere(2)
p = 1 / sqrt(2) * [1.0, 0.0, 1.0]
data = [exp(M, p,  σ * rand(M; vector_at=p)) for i in 1:n];
f(M, p) = sum(1 / (2 * n) * distance.(Ref(M), Ref(p), data) .^ 2)
grad_f(M, p) = sum(1 / n * grad_distance.(Ref(M), data, Ref(p)));
m1 = gradient_descent(M, f, grad_f, data[1])

in Manopt.jl too the manifold gets passed only at the optimizer invocation

kellertuer commented 6 months ago

My fear is that this is too prone to error for the following reason.

you can think of

m1 = gradient_descent(M, f, grad_f, data[1])

as

m1 = gradient_descent((M, f, grad_f), data[1])

so internally we actually have an objective (that would store f and grad_f) and a problem that store the objective and the manifold (!) Already the initial point (here data2[1]?) is in Manopt.jl part of the “setting” of the solver not the problem (but that might really be philosopical

but that is not so nice to call, so I would prefer a syntax more like

    optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
    prob = OptimizationProblem(M, optf, data2[1])

    opt = OptimizationManopt.GradientDescentOptimizer()
    @time sol = Optimization.solve(prob, opt)

(yes I have not looked at the internals, sorry) This would be actually what Manopt does.

Otherwise I would fear that a user might to easily do

    opt = OptimizationManopt.GradientDescentOptimizer(M2) % for a random other manifold
    @time sol = Optimization.solve(prob, opt)

and I fear this might really be something that I would avoid to lead the user to do – not only because I do not want to explain this (again and again). and wonder why this tremendously fails, even if you take the same manifold with just another metric – you gradient might already be completely wrong.

Vaibhavdixit02 commented 6 months ago

That makes sense, I'll figure out the best way to make it work

kellertuer commented 6 months ago

One thing one could do, is to define a new constructor for the OptimizationProblem whose first argument is a manifold. This is stored in the keywords and passed on to a solver? Would that make sense?

I checked both the objective and the problem. While both have quite quite a few fields, they all seem to be very very much tailored towards constraint problems and objectives (up to second order) solvers.

Vaibhavdixit02 commented 6 months ago

This should be ready now.

The syntax looks like this

    optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
    prob = OptimizationProblem(M, optf, data2[1], manifold = M)

and if the optimizer's manifold doesn't match the problem's it throws an error

"Either manifold not specified in the problem `OptimizationProblem(f, x, p; manifold = SymmetricPositiveDefinite(5))` or it doesn't match the manifold specified in the optimizer `$(opt.M)`"
kellertuer commented 6 months ago

The syntax looks like this

    optf = OptimizationFunction(f, Optimization.AutoForwardDiff())
    prob = OptimizationProblem(M, optf, data2[1], manifold = M)

That is exactly what I had in mind

and if the optimizer's manifold doesn't match the problem's it throws an error

"Either manifold not specified in the problem `OptimizationProblem(f, x, p; manifold = SymmetricPositiveDefinite(5))` or it doesn't match the manifold specified in the optimizer `$(opt.M)`"

It is already wuite late here so I will only have time tomorrow afternoon to check the whole code – but what is the reason to store the manifold in the optimiser? Sure the constructors sometimes need the manifold so we could keep that (for them to fill reasonable defaults), but if the optimiser would not store the manifold (since it is now stored in the problem anyways), then why store-twice-and-error instead of store-once-be-happy?

kellertuer commented 6 months ago

One thing I do not yet fully understand and/or see is, whether each and every solver in Manopt needs a wrapping struct here. But maybe that is necessary.

That is for now all I have in comments.

Vaibhavdixit02 commented 6 months ago

I don't like that either. I was actually trying to figure out how to avoid that already!

kellertuer commented 6 months ago

I can check that as well, but not for the next week or two, since we are currently in the last few weeks of the semester and besides designing an exam somewhere in between are not all the requests by students rising.

For now I was just wondering about that, because it seems to store a few parameters/settings (like the retraction or the vector transport) but others not. But on the other hand solvers are usually also specified in Optimization by structs right? at least that is how I read the second argument of the line sol = solve(prob, BFGS()) from the readme; sure the structs here are maybe really just the same as that (just the here often retraction and vector transport have to be stored then).

Vaibhavdixit02 commented 6 months ago

I think it is possible to avoid them. Will push once I get it working

Vaibhavdixit02 commented 6 months ago

Oh I was wrong, since you don't have any method structs defined in Manopt.jl we'd need to create them here for the API to match other solver wrappers.

kellertuer commented 6 months ago

Can you elaborate a bit what you mean with method structs?

The reason I cam to that question is, that I think all the wrapper structs here are very similar (just less variables stored) to the structs call state in Manopt.jl (https://github.com/JuliaManifolds/Manopt.jl/blob/1ea2ac69d1ec48f3029841ddcee4af8958d361af/src/solvers/gradient_descent.jl#L2 ). So what this wrapper here currently does is: Inventing a new problem (https://manoptjl.org/stable/plans/problem/) objective (https://manoptjl.org/stable/plans/objective/) and a state like the one above, to unwrap all that when called, calling the (high-level) interface gradient_descent which internally again produces a problem, an objective and a state. When all that is finished the result is returned, but I feel this currently doubles all the action of problem, objective and state. Maybe that is necessary, I have not yet fully understood the Optimization API to that extend.

Vaibhavdixit02 commented 6 months ago

@kellertuer the Optimization.jl API looks like this in general

f(x, p) = ...
constraints(x , p) = ...
optf = OptimizationFunction(f, AutoEnzyme())
prob = OptimizationProblem(optf, x0)
sol = solve(prob, Optimizer())

hence it needs a struct to instantiate the Optimizer() . Since Manopt doesn't have it's own types/structs for these we have to create them here.

I did see the AbstractManoptSolverState and this could be used to create the list of all the methods something like

julia> subtypes(Manopt.AbstractManoptSolverState)
15-element Vector{Any}:
 AbstractGradientSolverState
 AbstractPrimalDualSolverState
 AdaptiveRegularizationState
 ConvexBundleMethodState
 CyclicProximalPointState
 DebugSolverState
 DouglasRachfordState
 LanczosState
 Manopt.AbstractSubProblemSolverState
 Manopt.ReturnSolverState
 NelderMeadState
 ParticleSwarmState
 ProximalBundleMethodState
 RecordSolverState
 SubGradientMethodState

but since the call_manopt_optimizer will still need to be manually defined for all it would probably be equal work haha

kellertuer commented 6 months ago

hence it needs a struct to instantiate the Optimizer() . Since Manopt doesn't have it's own types/structs for these we have to create them here.

GradientDescentStateis exactly such a type/struct. It stores all settings the solver (in this example gradient descent) needs. The one thing that might be a bit different in Manopt.j is, that gradient_descent is the function meant to be used by an end user and GradientDescentState is more like an internal thing. Also all AbstractManoptState subtypes by convention have M as their first argument in the constructor and usually a few mandatory arguments as well (augmented lagrangiag for example has 2 mandatory set of parameters to specify sub problem and sub solver). But at least the ones without sub solvers quite often have a lot of defaults, so then only the manifold is a parameter (the manifold is not stored in the state, just used to for example take a good default retraction).

That is why I am not sure it is worth “inventing” half-states here that store parts of those instead just to dispatch on the call_manopt_optimizer. But ok, that function I also have not yet fully understood.

Vaibhavdixit02 commented 6 months ago

I'd be very happy to not have to reimplement all that for every solver but right now it's impossible based on my understanding, but maybe @mateuszbaran can also suggest if we can avoid this and rely on some more generic interface to solvers in Manopt?

kellertuer commented 5 months ago

I hope I made clear, that I am not 100% sure that works, since I am not yet 100% sure how Optimization.jl works In all details, it just seems, the new structs are a bit of “double structures” introduced. Yeah but maybe Mateusz sees that a bit better.

mateuszbaran commented 5 months ago

Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem, and the only other option is solver_name functions. They are fairly uniform.

I had a very similar problem when working on hyperparameter optimization for Manopt: https://juliamanifolds.github.io/ManoptExamples.jl/stable/examples/HyperparameterOptimization/ . Lack of a problem-independent solver structure made that code more ugly than I'd like.

Anyway, I think something like this would be a bit nicer then specializing in great detail for each solver:


struct ConcreteManoptOptimizer{TF,TKW<:NamedTuple} <: AbstractManoptOptimizer
    optimizer::TF
    kwargs::TKW
end

ConcreteManoptOptimizer(f::TF) where {TF} = ConcreteManoptOptimizer{TF}(f, (;))

# for example
# solver = ConcreteManoptOptimizer(quasi_Newton)

function call_manopt_optimizer(
    M::ManifoldsBase.AbstractManifold, opt::ConcreteManoptOptimizer,,
    obj::AbstractManifoldObjective,
    x0;
    stopping_criterion::Union{Manopt.StoppingCriterion, Manopt.StoppingCriterionSet},
    kwargs...)

    opts = opt.optimizer(M,
        loss,
        obj,
        x0;
        return_state = true,
        stopping_criterion,
        kwargs...)
    # we unwrap DebugOptions here
    minimizer = Manopt.get_solver_result(opts)
    return (; minimizer = minimizer, minimum = loss(M, minimizer), options = opts)
end

And here:

    opt_res = call_manopt_optimizer(manifold, cache.opt, _loss, gradF, cache.u0;
        solver_kwarg..., stopping_criterion = stopping_criterion)

You'd have to construct a manifold objective:

obj = ManifoldGradientObjective(_loss, gradF; evaluation=evaluation)
opt_res = call_manopt_optimizer(manifold, cache.opt, obj, cache.u0;
      solver_kwarg..., stopping_criterion = stopping_criterion)

or

ManifoldCostObjective(f)

for gradient-free optimization, or

ManifoldHessianObjective(f, grad_f, Hess_f, preconditioner; evaluation=evaluation)

for optimization with Hessian-vector products.

If this doesn't work for some solver, we could either fix it in Manopt or specialize call_manopt_optimizer for that solver.

Note that some solvers don't strictly require x0 but that is not particularly important at this stage.

Maybe ConcreteManoptOptimizer could even exist in Manopt since it would also help make the hyperparameter optimization example nicer.

Another note, Armijo and WolfePowell are not great line search algorithms. https://github.com/mateuszbaran/ImprovedHagerZhangLinesearch.jl/ is currently the best one available for Manopt. It might get integrated into LineSearches.jl at some point but currently it doesn't have a particularly high priority (see https://github.com/JuliaNLSolvers/LineSearches.jl/pull/174 ).

kellertuer commented 5 months ago

I do agree that the `solver_name´ (high-level) functions were always meant to be the ones used by end users, but could you elaborate on

Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem

? I think the main thing the solver states do need is a manifold. Only the ones with sub solvers sometimes have some nice fallbacks where the sub problem is automatically created from the main objective as a default – but otherwise the user has to specify a full subsolver anyways.

For example https://manoptjl.org/v0.4/solvers/gradient_descent/#Manopt.GradientDescentState – sure needs a manifold like nearly anything in Manopt, but that is also just needed to initialise the internal fields.

Concerning the need of a x0 (or whether a random is chocse) – we should unify that to always fall back to a rand.

I like your approach and if some states do not yet fit that we could check how we can adapt them :)

mateuszbaran commented 5 months ago

I do agree that the `solver_name´ (high-level) functions were always meant to be the ones used by end users, but could you elaborate on

Manopt currently lacks a solver structure that fits here. State is supposed to be constructed when you have at least a bit of information about the problem

? I think the main thing the solver states do need is a manifold. Only the ones with sub solvers sometimes have some nice fallbacks where the sub problem is automatically created from the main objective as a default – but otherwise the user has to specify a full subsolver anyways.

States sometimes also need a point for initialization, and requiring a manifold simply rules them out in hyperparameter optimization, and in your own words:

The newest test already looks quite nice, just that philosophically I would prefer to have the manifold in the objective not the solver.

For example https://manoptjl.org/v0.4/solvers/gradient_descent/#Manopt.GradientDescentState – sure needs a manifold like nearly anything in Manopt, but that is also just needed to initialise the internal fields.

Concerning the need of a x0 (or whether a random is chocse) – we should unify that to always fall back to a rand.

I like your approach and if some states do not yet fit that we could check how we can adapt them :)

This is IMO not a great solution, these states don't look like they were intended to be used like that. And ugly things start happening when the default point doesn't match what user wants to provide through the Optimization.jl interface. And we have two places where manifold is needed, not great.

kellertuer commented 5 months ago

Well for the point, we should then move to using rand(M) for them. But we will not be able to get rid of the manifold, since we definetly need it for said rand then – and to initiallze (usually zero-) tangent vectors. I do not see any way around that.

Sure the manifold is needed in two places – but I do not see how we can avoid that (in like: at all, that is structurally necessary to initialise some fields and types).

mateuszbaran commented 5 months ago

Ideally we should have manifold in only one place, either objective or solver. You suggested you prefer manifold to be in the objective. So we can't use Manopt states to represent solvers here. And IMO states shouldn't really be constructed until we have initial search point. Which gives another reason why states are not suitable here.

Vaibhavdixit02 commented 5 months ago

The State should map to OptimizationState https://docs.sciml.ai/Optimization/stable/API/optimization_state/. The solver structs are for telling the solve this is the optimizer I want to use - the State are more for storing how the optimization is progressing and since it has "State" in its name so it can't directly be used here

kellertuer commented 5 months ago

Hm. I do not see how states (as they are designed in Manopt.jl) should ever be able to be generated without a manifold; something like storing the iterate requires at some point to initialise that field (and know its type, since all states are parametrised with that). So we can not do magic.

Ah, now I see, I thought until now “solver struct” is the Optimization.jl alias for state, though I saw the state thingy. Then the state should definelty not be used in the “solver struct” setting – even more since it is not suitable. The question is now should we do one “solver struct” for every Manopt Solver? Or is a generic one enough (and more flexible)?

Vaibhavdixit02 commented 5 months ago

Or is a generic one enough (and more flexible)

It'd need to be for each solver.

kellertuer commented 5 months ago

Then my conclusion is: the current way of doing this is probably the best we can do.

Vaibhavdixit02 commented 5 months ago

Cool, I'll merge this PR and continue fleshing it out more with a fresh PR to keep it contained and easier to review

kellertuer commented 5 months ago

Ui. I am still always surprised how fast SciML merges, I am not yet convinced this works safe and sound – but it seems our philosophies on merging are maybe different.

Again, thanks for working on this :) Once this works, I will adapt the Manopt readme for sure.

kellertuer commented 5 months ago

and with works, I also mean officially mentioned in the Readme and docs :)

Vaibhavdixit02 commented 5 months ago

Notice that it's not going to be a registered package until we reach convergence so the merging is more for dev and review convenience rather than to put it out for end users yet!

kellertuer commented 5 months ago

Yes, I saw that. As I said, it is just a slightly different philosophy. My idea is usually that what is on master is basically “ready to put out” and maybe just waiting for one or two other PRs to make a nice new registered version.

Vaibhavdixit02 commented 4 months ago

@kellertuer @mateuszbaran Sorry I got busy with end-of-semester things since we last talked about this. I am hoping to get this out in the next few days.

  1. The main missing piece I see is just wrapping more of the solvers from Manopt here.
  2. I am not completely sure how the constraints handling should be implemented here, looks like the interface is mainly the exact penalty and augmented lagrangian methods in Manopt so maybe the user can pass in those as the solver with the unconstrainted solver as a field? I have recently done an Augemented lagrangian implementation in Optimization.jl which I am planning on spending effort on making efficient and generic enough for most solvers so that's an option as well
  3. Anything else you see as a major blocker for this to be usable?
kellertuer commented 4 months ago

No worries, I am happy that Manopt will be available here somewhen and then link to it, but it is not urgent; and sure sometimes there is teaching and semester things.

  1. sure that would be great
  2. the point here is, that besides ALM and EPM I am only aware of one further algorithm: Interior Point Newton from beginning of this year. I currently have a master student looking at that within Manopt.jl. I do not understand your part “o maybe the user can pass in those as the solver with the unconstrainted solver as a field” ? those==the constraints? then “as” does not make sense. However, constraint optimization on manifolds is a very new field (ALM&EPM are from a paper from 2020) – so I can not say much about how that would fit into a generic framework (when that stems mainly from “Eucldiean ideas” – maybe it does, maybe not, manifolds are sometimes tricky).
  3. I am a bit unsure what you are referring to with “this” – the interface here? No that looks great! The constraint algorithms? Not sure, since I have not understood much of your point 2 I fear.
mateuszbaran commented 4 months ago

2. I am not completely sure how the constraints handling should be implemented here, looks like the interface is mainly the exact penalty and augmented lagrangian methods in Manopt so maybe the user can pass in those as the solver with the unconstrainted solver as a field? I have recently done an Augemented lagrangian implementation in Optimization.jl which I am planning on spending effort on making efficient and generic enough for most solvers so that's an option as well

We are currently reworking constraint handling in Manopt. For box constraints I have added the Hyperrectangle manifold with corners, and our quasi-Newton solvers should work correctly with it since https://github.com/JuliaManifolds/Manopt.jl/pull/388 . How is this different from L-BFGS-B? Well, the classic L-BFGS-B is more careful about computing directions of descent so that it never encounters a non-positive direction while Manopt.jl currently just resets the qN operator and goes towards negative gradient. It's less efficient but this way we can handle mixed box-manifold constraints and it appears to still be convergent. Another thing is line search -- L-BFGS-B has a specialized line search that looks for potential points of nondifferentiability while we do not. Again, it's not an ideal approach but I wouldn't be surprised if this were the SoTA for mixed box-manifold constraints. I'm also going to try a similar approach for the positive semi-definite manifold with corners.

The second part of our rework is https://github.com/JuliaManifolds/Manopt.jl/pull/386 where we are going to provide a more performance-oriented interface for arbitrary equality and inequality constraints (and more). I'm looking for some standard benchmark for such problems, if you have anything I'd very much like to add Manopt there. I can write Manopt benchmarking myself if you have it working for other optimization libraries. I was thinking about CUTEst.jl but I couldn't find any complete benchmark based on it for any Julia library.

3. Anything else you see as a major blocker for this to be usable?

For box constraints and mixed box-manifold constraints, you should be good to go. For arbitrary constraints I would suggest waiting until https://github.com/JuliaManifolds/Manopt.jl/pull/386 is finished. Though if you have a benchmark, I could use it identify potential performance issues in that PR.