kellertuer commented 4 months ago

This is a start to rework constraints, make them a bit more flexible and address / resolve #185.

All 3 parts, the function $f$, inequality constraints $g$ and equality constraints $h$ are now stored in their own objectives internally. That way, they can be more flexibly provided – for example a Hessian for one of them.

Besides that, there is more details available on the co-domain of these, especially in case of a single function, the gradient can now also be specified to map into the tangent space of the power manifolds tangent space.

One open point still to check is, how the internal functions can be adopted to this in hopefully a non-breaking way. A main challenge (or where I stopped today), is, when a function that returns gradients (or a gradient of functions) is stored in an objective – and it hence might be cached – how to best model element access. This might also be a nice improvement (or a unification with) the LevenbergMarquart type objective. One could unify that to an objective of type VectorObjective maybe.

Comments on this idea welcome.

🛣️ Roadmap

[x] 👨‍💻 rework the internal functions / accessors to the constraints
~~👨‍💻 unification with the LestSquaresObjetice (VectorObjective ?)~~
[x] 👨‍💻 rewrite all constrained gradients and costs to have a positional range, e.g. ALM and EPM grad.
[x] 👨‍💻rewrite that DefaultManifoldProblem passes in a range nothing to trigger the default
[x] 👨‍💻refactor all get_gradients to get_gradientwith Colon
[x] 👨‍💻deprecate the plural variant
[x] 👨‍💻refactor the ConstrainedManifoldObjective that
- [x] 🔁g and grad_g are internally stored as the new VectorialGradientFunction
- [x] 🔁 h and grad_h are internally stored as a mew VectorialGradientFunction
[x] 👨‍💻adapt the high-level interfaces to have a range as well
[x] 🔎 check how this would best interact with Caching.
[x] ⚠️ For best of cases stay non-breaking
[x] 📈 test coverage
[x] 📚 documentation

mateuszbaran commented 3 months ago

Then maybe let's ask about it on Slack and go with whatever gets more votes?

kellertuer commented 3 months ago

Hm, I to zero extend understand any argument for your point.

It deviates from usual vector indexing
it uses much more memory
anyone who would use any subsampling would probably also only use these, so why return more / require more?

Feel free to ask, from the current acticity and how much I saw people using Manopt (carefully phrased: not really many) – no one will answer besides us.

mateuszbaran commented 3 months ago

It deviates from usual vector indexing

I demonstrated how it does not, it corresponds to a different indexing scheme. I'm used to doing basically everything in-place so that's what I'm defaulting to.

it uses much more memory

Only in cases where we never need to evaluate more than a small subset of constraints.

anyone who would use any subsampling would probably also only use these, so why return more / require more?

To not have to reallocate memory when more is needed.

Feel free to ask, from the current acticity and how much I saw people using Manopt (carefully phrased: not really many) – no one will answer besides us.

People might have encountered a similar problem in other optimization libraries so they may answer even if they don't use Manopt specifically.

kellertuer commented 3 months ago

Then I do not understand your argument still, anyways nor understand your demonstration. The code you posted, I read mainly as a reason for my proposal. I really see zero advantages in your approach and only confusion on the use side, that if they ask for 2 gradients they still have to pass in memory for a million of them. This seems to me super inconsistent.

kellertuer commented 3 months ago

And sure feel free to ask on Slack; I would expect zero feedback to be honest.

kellertuer commented 3 months ago

Since my next tests would be to write extensive tests for exactly these addressing things, I will stop then for now.

To be honest, your proposal is so far from what I expected, that I never had that in mind. To me it really makes no sense, you could pass a view into the X if you want to update parts of your memory, but all other cases would work on “reduced constaints” anyways

mateuszbaran commented 3 months ago

I can see some advantages to your proposal, it just isn't particularly natural to me.

X .= [ gradient j evaluated at p for j=2:3]

This is not really allowed if X has a different size than the number of evaluated constraints. We can either do

X[1:2] .= [ gradient j evaluated at p for j=2:3]

which is your proposal or

X[2:3] .= [ gradient j evaluated at p for j=2:3]

which is mine. I think you should at least see from this why mine looks more consistent to me than yours (we use the same indexing vector twice instead of two different ones).

kellertuer commented 3 months ago

That example at least makes clear why you feel that yours is more consistent, but it is probably mainly a question of preceedence

I do not agree with your first line, though. That is misleading and not what I meant. That line is a

reduced_X = [ ... ]

reduced forms are not allowed in yours. You would always need memory for full constraints. The center line is a misinterpretation of the reduced form.

What your form actually assigns in the = is also a vector of length 2 (not one of length 3 or 10k).

So to summarise:

my approach assumes the [2:3] is taken on both sides before calling the in-place variant.
your approach assumes the [2.3] is basically only taken on the right hand side, while the X is not subsampled.

kellertuer commented 3 months ago

or phrased differently: Your variant would always require a X = zeros(10^17) to be allocated.

For the inplace variant mine allows for both

get_constraint!(X_reduced, co, x, 2:3) could work on memory of size 2, get_constraint!(@view X[2:3], co, x, 2:3) works for the large Xfrom above. So one can use both.

The first variant is not possible in your version.

mateuszbaran commented 3 months ago

I see your point and there is already some support for your proposal on Slack :slightly_smiling_face: . As long as it is clearly documented I'm fine with using your approach.

kellertuer commented 3 months ago

As a compromise:

We could realise your variant once we have a sparse power manifold representation and fill values once the in-place variant gets such a type? Otherwise it is also not too hard to write a update_grad_equality_constraint!() for your case (if you have a better name, feel free to propose

Both get and update would just call the same (internal) function that has the indexing (maybe the index set? Or a symbol?) as an additional parameter.

mateuszbaran commented 3 months ago

Given the support on Slack I think it's better to just go with your proposal, and I will just be careful around constraint indexing. I don't have any plans regarding sparse power manifolds anyway.

kellertuer commented 3 months ago

Ok, that is of course also fine with me.

Besides a few small vectorial Jacobian tests, the main test coverage missing is Cache/Count on the new access indices (integer arrays or bit vectors). For the docs, I gave vale a new try and will work a bit on that, since it seems to work more stable now.

kellertuer commented 3 months ago

Test coverage is nearly done – it seems I just missed a few single cache miss cases (will do one final empty-cache round of tests when I see which ones).

Then only careful-docs-reading and adding range= to the high-level interfaces (of ALM and EPM) is missing

kellertuer commented 3 months ago

I think this is already finished to the extend, that the only things missing are

[x] thoroughly reading the docs whether the new range ideas get clear from what we document (also the last point open in the initial post)
[x] check whether we can improve the internal functions (like Lagrangian cost/grad, exact penalty subcost/grad) in performance by using the new indexing possibilities.

kellertuer commented 3 months ago

@mateuszbaran Do you want to review this still (and should I wait) or can this be merged?

As you mentioned in the linked topic, we surely could use this for SGD as well – since that also until now uses the same gradients as we originally had here: vector of gradient functions. I do not think that is a “strange” representation, since it allows exactly what SGD needs: single gradient evaluations. But that is probably something for a next PR.

kellertuer commented 3 months ago

I noticed it might be nice to provide hessians for the constraints.

This is in principle done, just test (and reading the docs again) missing; Though for now the new methods introduce ambiguities, thatI do not understand where they come from (all are created replacing: p, by p, X, in the arguments, since this affects all new Hessians, they should work just like the gradient ones (where we have no ambiguities) but somehow they don't. Je suis confused.

kellertuer commented 3 months ago

Found the problem, just surprised that it does not appear for the gradient as well. But this is now nearly finished (again). Just waiting for code cov.

kellertuer commented 3 months ago

Now I would consider this actually and really done :)

kellertuer commented 3 months ago

Just if you have time, could you check whether I can merge this?

I would start with the other (ambiguity) issue then – nothing urgent, for sure.

mateuszbaran commented 3 months ago

There are still a few things I'd like to check. Could you wait with merging until the end of this week?

kellertuer commented 3 months ago

Sure, no problem. I value your feedback and this is not urgent, but still thanks for the rough time frame – I can check this back on Saturday or Monday even, then.

mateuszbaran commented 3 months ago

I did a whole bunch of minor improvements: fixing typos, adding type bounds, a couple of missing tests. Let me know if there is anything you don't like. By the way, it looks like using i and j for constraint indexing is a bit mixed. I converted some i to j but I wasn't particularly thorough.

kellertuer commented 3 months ago

Thanks, having a solid design was my main goal. Performance is probably something we can look at in the future then.

kellertuer commented 3 months ago

By the way, it looks like using i and j for constraint indexing is a bit mixed. I converted some i to j but I wasn't particularly thorough.

Yes I think I started with having I for inequality and j vor eq and got that mixed up a bit later. Hope that is not too bad – if that is confusing we should make that more consistent, but for me that Is also fine, as is now, since that never mixes anywhere.

kellertuer commented 3 months ago

I did a whole bunch of minor improvements: fixing typos, adding type bounds, a couple of missing tests. Let me know if there is anything you don't like.

Wow, thanks for fixing all those :) they all look like very great fixes; for bounds I am sometimes not sure when they are useful and when we can skip them, so I trust your expertise here.

mateuszbaran commented 3 months ago

Yes I think I started with having I for inequality and j vor eq and got that mixed up a bit later. Hope that is not too bad – if that is confusing we should make that more consistent, but for me that Is also fine, as is now, since that never mixes anywhere.

That's not a big issue but there were a few places where docstring and the method it was attached to used different conventions, or type signature in docstring and docstring text. Anyway, that's a minor consistency issue.

Wow, thanks for fixing all those :) they all look like very great fixes; for bounds I am sometimes not sure when they are useful and when we can skip them, so I trust your expertise here.

You're welcome :). I add bounds for type parameters mainly because then I know what to expect there. If a field is a function or a functor object, I prefer no bound. Otherwise I prefer to have a bound unless there is a good reason against it (for example one reason may be wrapping objects from weak dependencies).

JuliaManifolds / Manopt.jl

Modularise Constraints #386

🛣️ Roadmap