Closed kellertuer closed 4 months ago
I think I found something. This was fun and I removed a few ambiguities already. The remaining ones I might think about, but they are a bit tricky, when we allow both solver(M, f, grad_f, rest...; kwargs...)
as well as solver(M, objective, rest...; kwargs...)
especially when rest
also contains the two forms for sub solvers we currently support (sub-problem/state and closed-form variant as a function).
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.76%. Comparing base (
f546183
) to head (37af2ae
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
But there is one case of unbounded parameters that I could not resolve. Maybe @mateuszbaran you can take a look? The problem with
I will rewrite it
This was fun and I removed a few ambiguities already. The remaining ones I might think about, but they are a bit tricky, when we allow both
solver(M, f, grad_f, rest...; kwargs...)
as well assolver(M, objective, rest...; kwargs...)
This may be a bit hard to resolve.
You would also usually not mix them up, because that would basically require to provide ab objective, a problem and another problem or such, so I am even fine with keeping the high-level interfaces on ignore. It is also just a few ones that have different formats to be called: difference of convex have a few different formats that âconflictâ in an ambiguity you would in reality never hit.
I will rewrite it
Oh a very nice rewrite dispatching on that again :) Well-done.
I checked both the get_last_stepsize
and the higher level ones.
the last step size would need a bit of a (breaking) rework, that I could do before a next breaking release.
For the high level ones, I neither feel that would be easy to do nor that it might be a problem that there is an ambiguity. One would basically have to really mix two of the documented forms and provide two sub problem solvers, one in closed form , one as a problem, and not provide a state.
The reason I do not want to narrow down those parameters in type (which could also resolve the ambiguity) is that I do not want to restrict f
or grad_f
to for example Function
.
particle_swarm
should be actually fixed IMO, see this call:
M = Circle()
f(::Circle, p) = p * 2 * p
swarm = [-Ď / 4, 0, Ď / 4]
s = particle_swarm(M, ManifoldCostObjective(f), swarm)
@test s â 0.0
I will take a look at other ones too.
EDIT:
stochastic_gradient_descent
, difference_of_convex_proximal_point
have a similar issue with Number
type.
truncated_conjugate_gradient_descent
ambiguity should be harmless, and get_last_stepsize
should probably be redesigned to fix those ambiguities.
Ah I See the point with particle swarm, the main tricky thing seems really to be that we aim to cover single-number-cases as well, but it should be fine with just one extra case each hopefully then.
for the last stepsize I agree; I would consider that function more internal though, but currently its signatures are a bitt all-obver-the.place and should be minimised / unified, I can check whether I see a way without breaking (too much).
I fixed last stepsize. There were a few too many interims dispatches that caused trouble, so I removed an unneccesary interims one, that was also only explicitly called in a test but nowhere else used. I documented the remaining ones more carefully and added an entry to the changelog about this
I've taken a look at particle_swarm
. The main problem are the two cases
particle_swarm(M, f, swarm)
versusparticle_swarm(M, mco, swarm)
where for the first we have a method that also works if the swarm is a vector of numbers. For the second we do not do this, which leads to an ambiguity. the problem is here, that while mco
does cary a cost inside and we could use get_cost_function
to âextractâ it and make it suitable to numbers, we can not âexchangeâ it in mco
; at least I have no clue how to do that if the mco
is for example âwrappedâ in a cache.
Long story short, I can only resolve this, by writing a particle_swarm(M, mco, swarm::AbstractVector{T}) where {T<:Number}
that errors. The only alternative would be to remove the variant that works on numbers, but that would even break the tests we have on that.
difference_of_convex_proximal_point
has exactly the same problem, namely that the only thing mandatory is a single function, just that there it is grad_h
(instead of f
), but the effect is completely the same as for PSO.
I see, that's indeed rather unfixable. I'd suggest adding an error hint like here: https://github.com/JuliaManifolds/Manifolds.jl/blob/4cbf30f44654f0be462193fdea7a164a3d746264/src/Manifolds.jl#L581-L588 and keeping it an ambiguity.
truncated_conjugate_gradient_descent
does not only cover a few deprecated cases, but also one where it is hard to distinguish the (optional) hessian and the optional initial value (wich unfortunately is both a point p
and the tangent vector X
), such that X
and hessian get into conflict.
That leads to 3 ambiguities (the other 3 are resolved by a breaking change, removing the deprecated ones).
I am not sure what to do about these, since we usually do not type p
nor X
and while the hessian is currently âreconizedâ by being a function, we already discussed that we maybe do not want that either.
Though one could do functors to subtype Function
that is maybe also restrictive. But then maybe the general question is:
If both a function am the initial point are optional, how to best distinguish the? For best of cases I would prefer to keep both positional, since that is ânice to useâ, but hard to get unambiguous, probably. So an alternative (but breaking change) could be to move all optional functions (like here the hessian) into (very unified named) keyword arguments. (This would not resolve the two problems above, but well).
I see, that's indeed rather unfixable. I'd suggest adding an error hint like here: https://github.com/JuliaManifolds/Manifolds.jl/blob/4cbf30f44654f0be462193fdea7a164a3d746264/src/Manifolds.jl#L581-L588 and keeping it an ambiguity.
I have no clue how to write such a hint, but something like âdoing PSO with a objective and a swarm consisting of numbers is not supportedâ â one could of course also just start such an algorithm, assuming that âA user providing an objectiveâ knows what they do and it will be the right one already, so one would just generate the âarray swarmâ.
Ah, one thing that might help (but maybe that is also a bit silly interface-wise) is to for example make the initial point always a tuple, so one would call
difference_of_convex_proximal_point(M, grad_h, (p0,))
which would remove the ambiguity with the Hessian, though not with the f-vs-objective (or grad-h-vs-objective) one.
I would prefer keeping p0 as the (always last) positional argument though, I think â because we dispatch on that for number types at least; otherwise one could consider moving p0 to a initial_value
keyword or so (but that would again be quite breaking as well).
I have no clue how to write such a hint, but something like âdoing PSO with a objective and a swarm consisting of numbers is not supportedâ â one could of course also just start such an algorithm, assuming that âA user providing an objectiveâ knows what they do and it will be the right one already, so one would just generate the âarray swarmâ.
I will write that hint then.
truncated_conjugate_gradient_descent
does not only cover a few deprecated cases, but also one where it is hard to distinguish the (optional) hessian and the optional initial value (wich unfortunately is both a pointp
and the tangent vectorX
), such thatX
and hessian get into conflict.
OK, could you mention it in a comment in the test file for Aqua?
If both a function am the initial point are optional, how to best distinguish the? For best of cases I would prefer to keep both positional, since that is ânice to useâ, but hard to get unambiguous, probably. So an alternative (but breaking change) could be to move all optional functions (like here the hessian) into (very unified named) keyword arguments.
I would suggest making Hessian a keyword argument, and keeping the rest positional. 7 different calls signatures in a docstring is a bit much, and it doesn't have the "no Hessian, no p" variant there which should probably also work?
I would suggest making Hessian a keyword argument, and keeping the rest positional. 7 different calls signatures in a docstring is a bit much, and it doesn't have the "no Hessian, no p" variant there which should probably also work?
That we should check carefully and introduce then for all solvers that need a hessian but are fine with an approximation (or maybe even more general: all solvers that require objective-parts that are optional / can be âfilledâ with a good default).
Ah, one thing that might help (but maybe that is also a bit silly interface-wise) is to for example make the initial point always a tuple, so one would call
difference_of_convex_proximal_point(M, grad_h, (p0,))
which would remove the ambiguity with the Hessian, though not with the f-vs-objective (or grad-h-vs-objective) one. I would prefer keeping p0 as the (always last) positional argument though, I think â because we dispatch on that for number types at least; otherwise one could consider moving p0 to a
initial_value
keyword or so (but that would again be quite breaking as well).
Making initial point a tuple looks awful to me, and since it's I think the most important optional parameter of every solver, I would try to avoid making it a keyword argument.
That we should check carefully and introduce then for all solvers that need a hessian but are fine with an approximation (or maybe even more general: all solvers that require objective-parts that are optional / can be âfilledâ with a good default).
Sure, that sounds like a good idea.
Ah to be precise p
for TCG can never be optional, since it specifies the tangent space, the iterate is X
which is optional, I am not sure a random tangent space as a domain is something reasonable ;)
But I do agree 7 signatures might be a few too much.
Ah to be precise
p
for TCG can never be optional, since it specifies the tangent space, the iterate isX
which is optional, I am not sure a random tangent space as a domain is something reasonable ;)
There is such signature there (the third one):
truncated_conjugate_gradient_descent(M, f, grad_f, Hess_f; kwargs...)
Making initial point a tuple looks awful to me, and since it's I think the most important optional parameter of every solver, I would try to avoid making it a keyword argument.
I fully agree that that would be quite ugly, and I feel the idea âp=rand(M)
is always the last positional argumentâ (for TCG: a random tangent vector ;)) is a very good design idea.
Then we should check for the next breaking release to make things like an optional hessian a keyword argument. Some solvers already have f
or such as keywords, where they can be filled reasonably or are only necessary for more debug opportunities.
I first thought this might just be a reasonable check to add, turns out, it even improves stability / reliability rethinking a few choices that lead to the ambiguities.
Is it ok if we do the rework of the interface (and their unification) in a new PR? I can also open an issue (and add that to the 1.0 roadmap) that sketches how to unify the high level interfaces and how they accept functions
I added tests with
Aqua.jl
.While we have a few ambiguity warnings that I for now ignored, they appear in different variants of the high-level solvers, that usually should not be a problem.
But there is one case of unbounded parameters that I could not resolve. Maybe @mateuszbaran you can take a look? The problem with
https://github.com/JuliaManifolds/Manopt.jl/blob/f5461838e7b6d03107522fcfe3a544aba306d222/src/plans/solver_state.jl#L415-L437
is that in case it is a vector of symbols, the two parameters
TTS, TPS
are declared but not use; I tried to localise them to but them into the union (the trick that otherwise helped), but then the code usesTPS
andTTS
and that is not possible if they are local to the union. Is there any other good way to access the parameters you need?I will keep this open for a while and maybe work a bit on the ambiguities, but maybe that would require a breaking change as well. At least for the last step size I will try to resolve that.