JuliaManifolds / Manopt.jl

๐Ÿ”๏ธManopt. jl โ€“ Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

Modularise Constraints #386

Closed kellertuer closed 3 months ago

kellertuer commented 4 months ago

This is a start to rework constraints, make them a bit more flexible and address / resolve #185.

All 3 parts, the function $f$, inequality constraints $g$ and equality constraints $h$ are now stored in their own objectives internally. That way, they can be more flexibly provided โ€“ for example a Hessian for one of them.

Besides that, there is more details available on the co-domain of these, especially in case of a single function, the gradient can now also be specified to map into the tangent space of the power manifolds tangent space.

One open point still to check is, how the internal functions can be adopted to this in hopefully a non-breaking way. A main challenge (or where I stopped today), is, when a function that returns gradients (or a gradient of functions) is stored in an objective โ€“ and it hence might be cached โ€“ how to best model element access. This might also be a nice improvement (or a unification with) the LevenbergMarquart type objective. One could unify that to an objective of type VectorObjective maybe.

Comments on this idea welcome.

๐Ÿ›ฃ๏ธ Roadmap

kellertuer commented 3 months ago

Now the main parts left seem to be interaction with the embedded objective and caching, the rest are internal tests of the Objective itself we have to adapt, it seems; and a bit of Aqua, that I have not yet looked at.

mateuszbaran commented 3 months ago

Yes that is the case since co.inequality_constraints is a VectorGradientFunction vgf, which has a cost (vector) and a gradient (vector of them). So evaluation one of the constraints is evaluating the cost of said vgf.

edit: One large advantage of this is, that we implement all the access to this only once (not once for eq once for ineq constraints โ€“ in the end maybe even the same for the Jacobian).

Then maybe it would be better to call it get_value instead of get_cost? After all, inequality constraints are not costs.

kellertuer commented 3 months ago

Sure, I do not mind. Maybe better to emphasise that it indeed not a cost, since we would expect costs to be real-valued.

kellertuer commented 3 months ago

I changed the name, the main things missing from the current errors (and that I am stuck with)

The larger issues after these are Cache and Count, that currently error a bit still, but these three ones I am again totally lost on and spent 1.5 hours without any progress. For cache/count: I think I do understand why both interactions fail and how to fix them, they just need a bit of internal restructure probably.

mateuszbaran commented 3 months ago

I will try to fix those iterate issues.

kellertuer commented 3 months ago

Takk! I will try the caches then next (tomorrow hopefully); they are a bit more technical than the counting, but I think I know what to do.

mateuszbaran commented 3 months ago

It looks like I've fixed everything except passthrough and caches:

Test Summary:                                        | Pass  Fail  Error  Total   Time
Constrained Plan                                     |  180     2     11    193  11.6s
  Partial Constructors                               |   10                  10   0.7s
  ConstrainedManifoldObjective{AllocatingEvaluation} |   23                  23   0.1s
  ConstrainedManifoldObjective{InplaceEvaluation}    |   23                  23   0.3s
  ConstrainedManifoldObjective{AllocatingEvaluation} |   23                  23   0.2s
  ConstrainedManifoldObjective{InplaceEvaluation}    |   23                  23   0.2s
  Augmented Lagrangian Cost & Grad                   |   12                  12   2.2s
  Exact Penalties Cost & Grad                        |   16                  16   2.3s
  Objective Decorator passthrough                    |                 4      4   1.8s
  Count Objective                                    |   34                  34   0.7s
  Cache Objective                                    |   12     2      7     21   2.5s
kellertuer commented 3 months ago

Interesting that it was just so few lines to fix. I think from what I read in the errors it would have taken me a few hundred. I also do understand exactly nothing of the new function, so thanks for providing that. The code is indeed now more clever than I am. Can you maybe document that a bit?

mateuszbaran commented 3 months ago

Julia's indexing interface is incredibly expressive but also quite difficult to understand beyond basic use cases. So there are sometimes magic solutions like this one, and actually the only reason I could do it so quickly is because I've already spent dozens of hours with other indexing code :sweat_smile: .

I will try to add some documentation.

kellertuer commented 3 months ago

Perfect, good that I said, I am stuck and let you resolve it then. The only error I do not yet fully understand is the โ€œno iterative get_constraintsโ€ thing. But I will check what to do there best and work on caches as soon as I find time.

mateuszbaran commented 3 months ago

The only error I do not yet fully understand is the โ€œno iterative get_constraintsโ€ thing.

How can I reproduce that? I don't see it among the currently failing tests.

kellertuer commented 3 months ago

Well I just run the tests on Juli 1.10, wanted to show that here, but the tests here seem to be weird currently, claiming we do not import Base.length though we definelty do.

Here is the full error I get

Cache Objective: Error During Test at /Users/ronnber/Repositories/Julia/Manopt.jl/test/plans/test_constrained_plan.jl:322
  Test threw exception
  Expression: get_constraints(M, cofa, p) == get_constraints(M, cccofa, p)
  MethodError: no method matching iterate(::typeof(get_constraints))

  Closest candidates are:
    iterate(::Base.MethodSpecializations)
     @ Base reflection.jl:1148
    iterate(::Base.MethodSpecializations, ::Nothing)
     @ Base reflection.jl:1154
    iterate(::Base.MethodSpecializations, ::Int64)
     @ Base reflection.jl:1155
    ...

  Stacktrace:
    [1] in(x::Symbol, itr::Function)
      @ Base ./operators.jl:1292
    [2] firstcaller(bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}}, funcsyms::Function)
      @ Base ./deprecated.jl:165
    [3] macro expansion
      @ ./deprecated.jl:132 [inlined]
    [4] macro expansion
      @ ./logging.jl:388 [inlined]
    [5] depwarn(msg::String, funcsym::Function; force::Bool)
      @ Base ./deprecated.jl:127
    [6] depwarn(msg::String, funcsym::Function)
      @ Base ./deprecated.jl:121
    [7] get_constraints(M::ManifoldsBase.DefaultManifold{โ„, Tuple{Int64}}, co::ConstrainedManifoldObjective{AllocatingEvaluation, ManifoldGradientObjective{AllocatingEvaluation, var"#f#117", var"#grad_f#118"}, Manopt.VectorGradientFunction{AllocatingEvaluation, Manopt.FunctionVectorialType, Manopt.FunctionVectorialType, var"#h#129", var"#grad_h#131", Int64}, Manopt.VectorGradientFunction{AllocatingEvaluation, Manopt.FunctionVectorialType, Manopt.FunctionVectorialType, var"#g#120", var"#grad_g#121", Int64}}, p::Vector{Float64})
      @ Manopt ~/Repositories/Julia/Manopt.jl/src/plans/constrained_plan.jl:268
    [8] macro expansion
      @ ~/.julia/juliaup/julia-1.10.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:669 [inlined]
    [9] macro expansion
      @ ~/Repositories/Julia/Manopt.jl/test/plans/test_constrained_plan.jl:322 [inlined]
   [10] macro expansion
      @ ~/.julia/juliaup/julia-1.10.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [11] macro expansion
      @ ~/Repositories/Julia/Manopt.jl/test/plans/test_constrained_plan.jl:308 [inlined]
   [12] macro expansion
      @ ~/.julia/juliaup/julia-1.10.3+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [13] top-level scope
      @ ~/Repositories/Julia/Manopt.jl/test/plans/test_constrained_plan.jl:6

But on the current commit I also have 3 9 instead of your 2 7 errors left on the cache objective. So while tests should be reproducible, we currently seem to have 3 different variants here. Very strange.

kellertuer commented 3 months ago

Hm but I can reproduce the error on Julia 1.9 we have on CI here as well: Though line 11 specifically has length in the things to import Julia 1.9 complaints that length was not imported ๐Ÿซ 

mateuszbaran commented 3 months ago

I can locally reproduce that Base.length error on 1.9 but not your iterate one. Both look weird.

mateuszbaran commented 3 months ago

Let's see if CI can reproduce your problem.

kellertuer commented 3 months ago

Yes, this time its your machine having something magic, that you have less errors, maybe you fixed them but did not commit that?

mateuszbaran commented 3 months ago

It looks like I've been running tests without depwarns, while you and CI had them turned on. Anyway, I've fixed the issue.

kellertuer commented 3 months ago

Ah, I see, I probably misunderstood how to use depwarn then. But good to know that that was the issue then I have to change those lines anyways, since that method shall be deprecated.

mateuszbaran commented 3 months ago

I've checked NonlinearLeastSquaresObjective and I don't actually see much benefit in unification with VectorGradientFunction. There is a bit of common logic and some some potential for support of additional Jacobian/gradient types but it doesn't look particularly important to me.

kellertuer commented 3 months ago

I've checked NonlinearLeastSquaresObjective and I don't actually see much benefit in unification with VectorGradientFunction. There is a bit of common logic and some some potential for support of additional Jacobian/gradient types but it doesn't look particularly important to me.

Semantically the gradient on the power manifold and the Jacobian should be really similar. I thought your initial idea even came from that, that get_vector on the power manifold of the Jacobin would yield the new gradient representation we now have.. Semantically I see a lot of similarities, but have not yet checked how useful that is for the code.

If it is not useful we can also leave out the CoefficientRepresentation for now, since that was meant to be used for the jacobian.

mateuszbaran commented 3 months ago

Semantically the gradient on the power manifold and the Jacobian should be really similar. I thought your initial idea even came from that, that get_vector on the power manifold of the Jacobin would yield the new gradient representation we now have.. Semantically I see a lot of similarities, but have not yet checked how useful that is for the code.

If it is not useful we can also leave out the CoefficientRepresentation for now, since that was meant to be used for the jacobian.

CoefficientRepresentation is still needed for constraints and that was my primary point, it might be used for least squares as well but least squares has it built-in already and there is probably something like 10 lines of logic in the middle of existing function that is duplicated. We could remove that duplication but it would be of no real benefit to least squares without additional work being done there. Maybe it would make sense to restructure least squares while resolving #332 .

kellertuer commented 3 months ago

Ok, then let's postpone LM until we/someone works on #332.

kellertuer commented 3 months ago

H, I thought now about this for about an hour but I am conceptually stuck on the caches. Mainly conceptually and related to indices again. Here are my two problems, that seem more clever than myself again:

  1. The old system had two caches: :GradEqualityConstraints and GradEqualityConstraint. The first one caches what is now the j::Colon scenario, the other the single evaluations (j::Integer). Would this distinction still be useful? I think so since the function evaluation variant would benefit quite a bit from the first case. So this is really just a question of usefulness. Also storing the single j::Integer for the array representation power manifold is a bit tricky to do maybe, I am not yet so sure, but my main problem is the design decision here.

  2. My main problem are again all j addressings else. Those could check either of the caches above to access cached values (first the full one and return a subset, other wise collect the single ones?); but for collecting the single ones I fear I need some trick like the _to_iterable_indices with the only problem that from the current objective, the array I would need is cache.objective.equality_constraints.jacobian!! and to safely know that that exists, the dispatch-types would be several line. So how would I get from any j to such an iterable here then? I am not able to do that.

Sorry for so many questions, I fear the originally simple Idea I had to realise this turned into something super technical and with the new indices-tricks into something I am not able to handle.

mateuszbaran commented 3 months ago

Hm, how do you remember in the old scheme which single constraint has been cached?

Sorry for so many questions, I fear the originally simple Idea I had to realise this turned into something super technical and with the new indices-tricks into something I am not able to handle.

No problem, I think this improvement is worth the work put into it. I will try to help with the caches.

kellertuer commented 3 months ago

For the first I think it is fine to keep that, might be beneficial for the function one.

I fixed all tests, but caching does not yet work for fancy new indexing as described in open point 2 for now (added function dummies with errors). And the similar problem (and testing) is missing for (single gradient eval) counting.

kellertuer commented 3 months ago

Hm, how do you remember in the old scheme which single constraint has been cached?

The old scheme was easy, I will directly describe it in the new notation

get_grad_equality_constraint(M, p, i::Integer) does cache with key (p,i) into the single gradient cache (n repetitions, hence the i in the key get_grad_equality_constraint(M, p, I::Colon) dies cache with key (p) but caches the whole power manifolds thingy (array representation not yet tested)

Those two now work also exactly that way, though this might lead to a double-cached tangent vector basically (but it did that potentially before as well).

An idea might be for any range i

my main problem is that I can access the number of constraints, but I can not get a _to_iterable_index thing there, since I can not access the AbstractArra you use some magic on (as mentioned it is a a.b.c.v.d.f.c.d.f.c.d.c.d.f. (subsubsubsubsubsubsubsubstructure thingy) at that point basically.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.77%. Comparing base (2a7cc0d) to head (d7f1205).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #386 +/- ## ======================================== Coverage 99.76% 99.77% ======================================== Files 71 72 +1 Lines 7224 7491 +267 ======================================== + Hits 7207 7474 +267 Misses 17 17 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mateuszbaran commented 3 months ago

_to_iterable_indices can actually be called on any vector of the right length (as many as the number of constraints).

By the way, I have a small question right now. If I want to, for example, evaluate gradients of constraints 3, 5 and 10 in-place, should they be saved at indices 3, 5 an 10 of the given array or 1, 2, 3? I would say 3, 5, 10 is more consistent but it looks like you attempt in some places to do 1, 2, 3? Sure, it might use less memory, but actually obtaining that benefit would be a large follow-up project so it might not be worth making our lives harder right now.

kellertuer commented 3 months ago

By the way, I have a small question right now. If I want to, for example, evaluate gradients of constraints 3, 5 and 10 in-place, should they be saved at indices 3, 5 an 10 of the given array or 1, 2, 3? I would say 3, 5, 10 is more consistent but it looks like you attempt in some places to do 1, 2, 3? Sure, it might use less memory, but actually obtaining that benefit would be a large follow-up project so it might not be worth making our lives harder right now.

They should be saved in 3,5,10 โ€“ and for single index access they are (as they were before). If you would do I=[3,5,10] that is exactly the magic we are missing. I am not sure how to get from any arbitrary indexing thingy (not sure what that is, for me all just magic) to the iterable. It could also be a BitArray or whatever.

kellertuer commented 3 months ago

I currently attempt nothing on those โ€“ they all have error messages in there, since I have no clue how to do this. For best of cases, as I said:

1) get the length of [3,5,10] and an iterable. 2) check whether the : cache exists, and return the that_cahe[3,5,10] (maybe with a copy to for the inplace) 3) if : does have no cache for that p, check which ones have index catches. Assume that is [3,10], then take them from cache 4) for the remaining ones (in the example [5,] call the getter with this indexing array, store them in the result we started in 3 5) cache all results from 4 in the corresponding indices 6) done

The only step I roughly would know how to do is 2 and 6.

mateuszbaran commented 3 months ago

They should be saved in 3,5,10 โ€“ and for single index access they are (as they were before).

Then why is there n = _vgf_index_to_length(i, vgf.range_dimension) instead of n=vgf.range_dimension?

I am not sure how to get from any arbitrary indexing thingy (not sure what that is, for me all just magic) to the iterable. It could also be a BitArray or whatever.

_to_iterable_indices does exactly that but you need to tell it what is going to be indexed, or give it something with the same shape.

kellertuer commented 3 months ago

Then why is there n = _vgf_index_to_length(i, vgf.range_dimension) instead of n=vgf.range_dimension?

where โ€œthereโ€? I basically rewrote all of the cache functions in the last 2 hours. The only ones that work are single index and : โ€“ both should not use that. The others have error cals in them, maybe first ideas, but I got lost in that, so maybe there is a fragment of that in there โ€“ sure. But I think I now wrote often enough that I have no clue what to do here.

edit: To be more precise, the _vgf_index thingy is indeed the 1,2,3 but I need that when returning the result (without cache or filled in by cache), since if the gradients get called with [3,5,10] the result is just a size-3-power manifold not a size-n one. So the result has these 1,2,3 โ€“ the cache should of course store at 3,5,10; everything else would be total chaos.

kellertuer commented 3 months ago

_to_iterable_indices does exactly that but you need to tell it what is going to be indexed, or give it something with the same shape.

Yes and you call it with vgf.jacobian!! (which might even be a function).

When I am in the cache this is get_objective(cache_obj, true).inequality_constraints.jacobian!!but also only if the user does not invent a better vectorial function, so I would have to super-type-dispatch-constrain that function to be suche to reach that jacobian. I would not know what else to take instead.

kellertuer commented 3 months ago

The newest commit is how far I get with the 6 steps above.

My main problem is, that when I check the single constraints cache, I have to iterate over the j in i (line 449), but at that point I do not have any array to pass as first argument to the _to_iterable_indices function. I also have no idea where to get such an array from.

If we manage that one function, I think I can adopt all others in that file and adopt even the counting accordingly.

mateuszbaran commented 3 months ago

I will check it during the weekend.

mateuszbaran commented 3 months ago

I'm trying to fix it right now... is there a function that tells how many equality constraints a given objective has?

kellertuer commented 3 months ago

Cool. For now not (yet) be we could surely implement one. (that would pass to the vectorgradientfunctions length() call). What would be a good name?

mateuszbaran commented 3 months ago

Maybe num_equality_constraints? I think need that to fix the issue with caching.

kellertuer commented 3 months ago

That was also my idea and I do not like the abbreviation num upfront. But as long as we do not have another idea that is what we take :)

mateuszbaran commented 3 months ago

I think equality constraints should more or less work right now after my latest commit, up to my one TODO. Let me know if anything is broken.

kellertuer commented 3 months ago

That looks good, and should even also work for :Colon (which is currently a form before). I will think about a name later today and continue. As I said, I hope that is the main thing missing besides testing and documentation.

kellertuer commented 3 months ago

I could not commit the :Colon case, since these are the only ones that actually can cache the full constraint evaluation (the others that are not an integer, can only read them)

kellertuer commented 3 months ago

I think I now implemented all cases. What is missing is testing and maybe a bit of documentation. We are closing in on this (and as for so often it was a bit longer than I thought).

kellertuer commented 3 months ago

First thing I noticed starting with testing, the new version of

https://github.com/JuliaManifolds/Manopt.jl/blob/667781dfce09d89741c0a7fe148836e7fc10c36c/src/plans/augmented_lagrangian_plan.jl#L117-L125

leads to an index out of bounds in the Tutorial (not yet in testing, but testing also does not yet include ranges so much).

_edit 1:_I think I found it. One other thing I noticed โ€“ I still use often X[pM, j] to write to X white you use _write โ€“ should I switch to that as well?

edit 2: Yay found it! The problem was that we have 2 indexing things: If you ask to access constrains [1,3,4] as a range, we still only return a tangent vector on the 3-powermanifold (not the 4 one) so the single indices thereon are 1,2,3. I adapted that in all necessary places, I hope. Still if you could check the X[pM,1] things that would be great (I am not so sure about differences and effects).

mateuszbaran commented 3 months ago

_edit 1:_I think I found it. One other thing I noticed โ€“ I still use often X[pM, j] to write to X white you use _write โ€“ should I switch to that as well?

edit 2: Yay found it! The problem was that we have 2 indexing things: If you ask to access constrains [1,3,4] as a range, we still only return a tangent vector on the 3-powermanifold (not the 4 one) so the single indices thereon are 1,2,3. I adapted that in all necessary places, I hope. Still if you could check the X[pM,1] things that would be great (I am not so sure about differences and effects).

I think we should switch to _write everywhere. It's more generic.

Regarding access like [1, 3, 4], I think basically all places that do that should do it in-place and provide sufficiently large array so that output is indeed written at indices [1, 3, 4] and not [1, 2, 3]. For consistency the allocating variant should do the same IMO.

kellertuer commented 3 months ago

I think we should switch to _write everywhere. It's more generic.

I have no clue what the actual differences are, but I can check and change

Regarding access like [1, 3, 4], I think basically all places that do that should do it in-place and provide sufficiently

I disagree. That would lead to much more memory usage and basically-empy-entries, which I feel is much to error prone. Even more, if you call get_inequality_costraint(M, ...., 1:3) you would for consistency expect a vector of length 3 not of length n.

kellertuer commented 3 months ago

For consistency the allocating variant should do the same IMO.

But then this is fully different from a vector subsample v[2:3] which I would consider both variants to be equal to. v[2:3] gives you a vector of length 2. We should to the same for such an indexing โ€“ and we actually do now.

mateuszbaran commented 3 months ago

I disagree. That would lead to much more memory usage and basically-empy-entries, which I feel is much to error prone. Even more, if you call get_inequality_costraint(M, ...., 1:3) you would for consistency expect a vector of length 3 not of length n.

For consistency the allocating variant should do the same IMO.

But then this is fully different from a vector subsample v[2:3] which I would consider both variants to be equal to. v[2:3] gives you a vector of length 2. We should to the same for such an indexing โ€“ and we actually do now.

Allocating variant should never be used in practice actually, especially when memory usage is of concern, and the in-place variant needs maximum capacity anyway. This is going to be error-prone either way because using the allocating variant requires compacting indices while the in-place one does not. But sure, at least your approach is mirroring array indexing so I can see how it looks consistent.

julia> X = rand(5)
5-element Vector{Float64}:
 0.768801628457022
 0.2125959599454933
 0.22309372971963248
 0.4219945781972464
 0.253065406198804

julia> Y = similar(X)
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

julia> Y[2:3] .= X[2:3]
2-element view(::Vector{Float64}, 2:3) with eltype Float64:
 0.2125959599454933
 0.22309372971963248

julia> Y
5-element Vector{Float64}:
 0.0
 0.2125959599454933
 0.22309372971963248
 0.0
 0.0

julia> X[2:3]
2-element Vector{Float64}:
 0.2125959599454933
 0.22309372971963248
kellertuer commented 3 months ago

Hm, I had hoped we slowly come to some convergence here. This seems not to be the case then. I totally disagree.

the allocating variant should be totally fine and work the same way as the in-place variant. I do understand get_grad_inequality_constraint!(M, X, co, p, 1:2) as the full right hand since of X .= [ gradient j evaluated at p for j=1:2]

The right hand side is a vector of length 2. If you just ask for 1:2 I feel it is super strange to (1) eventually get a thousand constraints back or pass memory for a thousand gradients at X. Also the integer ones (which should act the same as the [j,] variants basically) work on single gradient vectors.

kellertuer commented 3 months ago

and yes, the idea is that this is the same as army indexing. We use an array indexing index j by now and have a vector of gradients. So if we would do something else than vector indexing behaviour, I do not have a good argument as to why, especially if the alternative might have a super huge memory overhead, for example with a million constraints where we have an algorithm that at every point in time only accesses and has to store two gradients.