JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

Introduce Aqua.jl-based checks #379

Closed kellertuer closed 4 months ago

kellertuer commented 4 months ago

I added tests with Aqua.jl.

While we have a few ambiguity warnings that I for now ignored, they appear in different variants of the high-level solvers, that usually should not be a problem.

But there is one case of unbounded parameters that I could not resolve. Maybe @mateuszbaran you can take a look? The problem with

https://github.com/JuliaManifolds/Manopt.jl/blob/f5461838e7b6d03107522fcfe3a544aba306d222/src/plans/solver_state.jl#L415-L437

is that in case it is a vector of symbols, the two parameters TTS, TPS are declared but not use; I tried to localise them to but them into the union (the trick that otherwise helped), but then the code uses TPS and TTS and that is not possible if they are local to the union. Is there any other good way to access the parameters you need?

I will keep this open for a while and maybe work a bit on the ambiguities, but maybe that would require a breaking change as well. At least for the last step size I will try to resolve that.

kellertuer commented 4 months ago

I think I found something. This was fun and I removed a few ambiguities already. The remaining ones I might think about, but they are a bit tricky, when we allow both solver(M, f, grad_f, rest...; kwargs...) as well as solver(M, objective, rest...; kwargs...) especially when rest also contains the two forms for sub solvers we currently support (sub-problem/state and closed-form variant as a function).

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.76%. Comparing base (f546183) to head (37af2ae).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #379 +/- ## ======================================= Coverage 99.76% 99.76% ======================================= Files 74 73 -1 Lines 7241 7243 +2 ======================================= + Hits 7224 7226 +2 Misses 17 17 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mateuszbaran commented 4 months ago

But there is one case of unbounded parameters that I could not resolve. Maybe @mateuszbaran you can take a look? The problem with

I will rewrite it

This was fun and I removed a few ambiguities already. The remaining ones I might think about, but they are a bit tricky, when we allow both solver(M, f, grad_f, rest...; kwargs...) as well as solver(M, objective, rest...; kwargs...)

This may be a bit hard to resolve.

kellertuer commented 4 months ago

You would also usually not mix them up, because that would basically require to provide ab objective, a problem and another problem or such, so I am even fine with keeping the high-level interfaces on ignore. It is also just a few ones that have different formats to be called: difference of convex have a few different formats that “conflict” in an ambiguity you would in reality never hit.

kellertuer commented 4 months ago

I will rewrite it

Oh a very nice rewrite dispatching on that again :) Well-done.

kellertuer commented 4 months ago

I checked both the get_last_stepsize and the higher level ones. the last step size would need a bit of a (breaking) rework, that I could do before a next breaking release.

For the high level ones, I neither feel that would be easy to do nor that it might be a problem that there is an ambiguity. One would basically have to really mix two of the documented forms and provide two sub problem solvers, one in closed form , one as a problem, and not provide a state. The reason I do not want to narrow down those parameters in type (which could also resolve the ambiguity) is that I do not want to restrict f or grad_f to for example Function.

mateuszbaran commented 4 months ago

particle_swarm should be actually fixed IMO, see this call:

        M = Circle()
        f(::Circle, p) = p * 2 * p
        swarm = [-π / 4, 0, π / 4]
        s = particle_swarm(M, ManifoldCostObjective(f), swarm)
        @test s ≈ 0.0

I will take a look at other ones too.

EDIT: stochastic_gradient_descent, difference_of_convex_proximal_point have a similar issue with Number type.

mateuszbaran commented 4 months ago

truncated_conjugate_gradient_descent ambiguity should be harmless, and get_last_stepsize should probably be redesigned to fix those ambiguities.

kellertuer commented 4 months ago

Ah I See the point with particle swarm, the main tricky thing seems really to be that we aim to cover single-number-cases as well, but it should be fine with just one extra case each hopefully then.

for the last stepsize I agree; I would consider that function more internal though, but currently its signatures are a bitt all-obver-the.place and should be minimised / unified, I can check whether I see a way without breaking (too much).

kellertuer commented 4 months ago

I fixed last stepsize. There were a few too many interims dispatches that caused trouble, so I removed an unneccesary interims one, that was also only explicitly called in a test but nowhere else used. I documented the remaining ones more carefully and added an entry to the changelog about this

kellertuer commented 4 months ago

I've taken a look at particle_swarm. The main problem are the two cases

where for the first we have a method that also works if the swarm is a vector of numbers. For the second we do not do this, which leads to an ambiguity. the problem is here, that while mco does cary a cost inside and we could use get_cost_function to “extract” it and make it suitable to numbers, we can not “exchange” it in mco; at least I have no clue how to do that if the mco is for example “wrapped” in a cache.

Long story short, I can only resolve this, by writing a particle_swarm(M, mco, swarm::AbstractVector{T}) where {T<:Number} that errors. The only alternative would be to remove the variant that works on numbers, but that would even break the tests we have on that.

kellertuer commented 4 months ago

difference_of_convex_proximal_point has exactly the same problem, namely that the only thing mandatory is a single function, just that there it is grad_h (instead of f), but the effect is completely the same as for PSO.

mateuszbaran commented 4 months ago

I see, that's indeed rather unfixable. I'd suggest adding an error hint like here: https://github.com/JuliaManifolds/Manifolds.jl/blob/4cbf30f44654f0be462193fdea7a164a3d746264/src/Manifolds.jl#L581-L588 and keeping it an ambiguity.

kellertuer commented 4 months ago

truncated_conjugate_gradient_descent does not only cover a few deprecated cases, but also one where it is hard to distinguish the (optional) hessian and the optional initial value (wich unfortunately is both a point p and the tangent vector X), such that X and hessian get into conflict. That leads to 3 ambiguities (the other 3 are resolved by a breaking change, removing the deprecated ones).

I am not sure what to do about these, since we usually do not type p nor X and while the hessian is currently “reconized” by being a function, we already discussed that we maybe do not want that either. Though one could do functors to subtype Function that is maybe also restrictive. But then maybe the general question is:

If both a function am the initial point are optional, how to best distinguish the? For best of cases I would prefer to keep both positional, since that is “nice to use”, but hard to get unambiguous, probably. So an alternative (but breaking change) could be to move all optional functions (like here the hessian) into (very unified named) keyword arguments. (This would not resolve the two problems above, but well).

kellertuer commented 4 months ago

I see, that's indeed rather unfixable. I'd suggest adding an error hint like here: https://github.com/JuliaManifolds/Manifolds.jl/blob/4cbf30f44654f0be462193fdea7a164a3d746264/src/Manifolds.jl#L581-L588 and keeping it an ambiguity.

I have no clue how to write such a hint, but something like “doing PSO with a objective and a swarm consisting of numbers is not supported” – one could of course also just start such an algorithm, assuming that “A user providing an objective” knows what they do and it will be the right one already, so one would just generate the “array swarm”.

kellertuer commented 4 months ago

Ah, one thing that might help (but maybe that is also a bit silly interface-wise) is to for example make the initial point always a tuple, so one would call

difference_of_convex_proximal_point(M, grad_h, (p0,))

which would remove the ambiguity with the Hessian, though not with the f-vs-objective (or grad-h-vs-objective) one. I would prefer keeping p0 as the (always last) positional argument though, I think – because we dispatch on that for number types at least; otherwise one could consider moving p0 to a initial_value keyword or so (but that would again be quite breaking as well).

mateuszbaran commented 4 months ago

I have no clue how to write such a hint, but something like “doing PSO with a objective and a swarm consisting of numbers is not supported” – one could of course also just start such an algorithm, assuming that “A user providing an objective” knows what they do and it will be the right one already, so one would just generate the “array swarm”.

I will write that hint then.

truncated_conjugate_gradient_descent does not only cover a few deprecated cases, but also one where it is hard to distinguish the (optional) hessian and the optional initial value (wich unfortunately is both a point p and the tangent vector X), such that X and hessian get into conflict.

OK, could you mention it in a comment in the test file for Aqua?

If both a function am the initial point are optional, how to best distinguish the? For best of cases I would prefer to keep both positional, since that is “nice to use”, but hard to get unambiguous, probably. So an alternative (but breaking change) could be to move all optional functions (like here the hessian) into (very unified named) keyword arguments.

I would suggest making Hessian a keyword argument, and keeping the rest positional. 7 different calls signatures in a docstring is a bit much, and it doesn't have the "no Hessian, no p" variant there which should probably also work?

kellertuer commented 4 months ago

I would suggest making Hessian a keyword argument, and keeping the rest positional. 7 different calls signatures in a docstring is a bit much, and it doesn't have the "no Hessian, no p" variant there which should probably also work?

That we should check carefully and introduce then for all solvers that need a hessian but are fine with an approximation (or maybe even more general: all solvers that require objective-parts that are optional / can be “filled” with a good default).

mateuszbaran commented 4 months ago

Ah, one thing that might help (but maybe that is also a bit silly interface-wise) is to for example make the initial point always a tuple, so one would call

difference_of_convex_proximal_point(M, grad_h, (p0,))

which would remove the ambiguity with the Hessian, though not with the f-vs-objective (or grad-h-vs-objective) one. I would prefer keeping p0 as the (always last) positional argument though, I think – because we dispatch on that for number types at least; otherwise one could consider moving p0 to a initial_value keyword or so (but that would again be quite breaking as well).

Making initial point a tuple looks awful to me, and since it's I think the most important optional parameter of every solver, I would try to avoid making it a keyword argument.

That we should check carefully and introduce then for all solvers that need a hessian but are fine with an approximation (or maybe even more general: all solvers that require objective-parts that are optional / can be “filled” with a good default).

Sure, that sounds like a good idea.

kellertuer commented 4 months ago

Ah to be precise p for TCG can never be optional, since it specifies the tangent space, the iterate is X which is optional, I am not sure a random tangent space as a domain is something reasonable ;)

But I do agree 7 signatures might be a few too much.

mateuszbaran commented 4 months ago

Ah to be precise p for TCG can never be optional, since it specifies the tangent space, the iterate is X which is optional, I am not sure a random tangent space as a domain is something reasonable ;)

There is such signature there (the third one):

truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f; kwargs...)
kellertuer commented 4 months ago

Making initial point a tuple looks awful to me, and since it's I think the most important optional parameter of every solver, I would try to avoid making it a keyword argument.

I fully agree that that would be quite ugly, and I feel the idea “p=rand(M) is always the last positional argument” (for TCG: a random tangent vector ;)) is a very good design idea.

Then we should check for the next breaking release to make things like an optional hessian a keyword argument. Some solvers already have f or such as keywords, where they can be filled reasonably or are only necessary for more debug opportunities.

kellertuer commented 4 months ago

I first thought this might just be a reasonable check to add, turns out, it even improves stability / reliability rethinking a few choices that lead to the ambiguities.

kellertuer commented 4 months ago

Is it ok if we do the rework of the interface (and their unification) in a new PR? I can also open an issue (and add that to the 1.0 roadmap) that sketches how to unify the high level interfaces and how they accept functions