Get OptimizationManopt ready for registration

SciML / Optimization.jl

Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.

https://docs.sciml.ai/Optimization/stable/

MIT License

688 stars 75 forks source link

Get OptimizationManopt ready for registration #763

Closed Vaibhavdixit02 closed 1 month ago

Vaibhavdixit02 commented 1 month ago

Checklist

[ ] Appropriate tests were added
[ ] Any code changes were done in a way that does not break public API
[ ] All documentation related to code changes were updated
[ ] The new code follows the contributor guidelines, in particular the SciML Style Guide and COLPRAC.
[ ] Any new documentation only uses public API

Additional context

Add any other context about the problem here.

mateuszbaran commented 1 month ago

Hi! I will check your PR soon. A few quick comments:

@mateuszbaran can you review the hessian wrraper here, it's incorrect right now the issue has to do with the representation of the riemannian hessian which seems to be a vector and not matrix, Ronny did comment on this a bit in another issue but I think I might not have understood it well?

Manopt currently doesn't support actual full Hessians. riemannian_Hessian corresponds to Hessian-vector products. Actually even the theory of converting Euclidean Hessians to Riemannian ones isn't particularly well developed.

Also for the sgd test I am not sure how to handle the gradient, maybe it can't be supported right now?

SGD currently has a somewhat weird interface (you need to provide a list of functions that are called in some order) but with https://github.com/JuliaManifolds/Manopt.jl/pull/386 (or soon after) we should have a nicer interface with batch number passed as an index. I'd suggest waiting a bit instead of trying the current interface.

kellertuer commented 1 month ago

For the Hessian, in Euclidean sense we do not return the matrix $H$ but always need a direction $d$ and compute $Hd$.

So instead of providing a point (we want the Hessian at) and returning a matrix, we take a point and. a direction (tangent vector) and the result $Hd$ is actually also a tangent vector we return. That especially avoids having to work in charts / bases of tangent spaces (which the matrix form would require).

The theory for (Euclidean) Hessian to Riemannian Hessian conversion is quite developed, but there is basically a chain rule involved where the Christoffel symbol appear, which makes that a bit tricky to implement every now and then.

Vaibhavdixit02 commented 1 month ago

Oops I thought I replied here earlier, sorry. Thanks for the clarifications both, it's very helpful!

Based on these suggestions I am leaning towards not supporting SGD (along with the proximal methods) for now. I will try to get the hessian methods working now that's clear, hopefully, we can wrap this first version up pretty soon.

kellertuer commented 1 month ago

I think that is a good choice. SGD might get a bit of a rework after the PR that is currently being finished. Proximal Methods should stay as they are, but I do see they are not so super common in general.

mateuszbaran commented 1 month ago

The theory for (Euclidean) Hessian to Riemannian Hessian conversion is quite developed, but there is basically a chain rule involved where the Christoffel symbol appear, which makes that a bit tricky to implement every now and then.

Yes, that is known, I forgot to mention that I mean the chart-free and coordinate-free variant where you start from the Hessian matrix in the embedding (flattened to one index). It doesn't look hopelessly impossible for symmetric spaces so maybe we will have that at some point?

kellertuer commented 1 month ago

I am not so sure, for now I can only do that for every manifold individually, using “Christoffelistical tools” and hoping the result can be expressed chart-free.

kellertuer commented 1 month ago

I tried to understand the code a bit for the last hour while trying to help with the questions raised. I am not so able to understand what this is doing. Am I just not clever enough or is the code intended to be “just a hack” that is understood by exactly the person that wrote it?

From the current errors a conclusion might also be that Manopt.jl – though started about 7 years ago, but many developed by me with the help of a few students – is maybe not mature enough if it errors so much?

Vaibhavdixit02 commented 1 month ago

What are the hacks that you keep seeing, I don't think I am doing anything here that should be referred to as that but if you have some specific parts you don't find rigorous please suggest them. I want to think this has been going pretty well and your comments have been respected and addressed so it's confusing why you have that view.

From the current errors a conclusion might also be that Manopt.jl – though started about 7 years ago, but many developed by me with the help of a few students – is maybe not mature enough if it errors so much?

No piece of software exists without bugs, 1000 people teams of exceptional engineers ship things like Microsoft teams. I didn't mean anything negative when I mentioned "possible" bugs above since I am not 100% sure where or why that's happening yet, and it seemed you and/or @mateuszbaran would have more insight about it.

kellertuer commented 1 month ago

What are the hacks that you keep seeing, I don't think I am doing anything here that should be referred to as that but if you have some specific parts you don't find rigorous please suggest them.

As I just wrote in a comment – for now I do not understand much of this (or the last PRs) code. But whenever I spent half an hour or 45 minutes like figuring out that one line with the hasfield hack – it is all more like hacked. Finding all of them – well I do have a usual job as well, so I can not read all your hurried PRs – sorry.

The same for the insight on bugs. I can check that, but probably not before end of June. So then the question is – can this wait so long or whether this has to be merged in 2-3 days and is in a hurry? Last PR I had that feeling, all has to be rushed. This would not help reducing bugs.

kellertuer commented 1 month ago

Oh – and to clarify – I also did not understand it as negative. Just that if Manopt is too buggy, it should maybe not published as an interface here. Even more if no one finds time to fix them. I am happy if people would use Manopt, but the current PR here shows a bit more that one person alone – even now 1.5 since Mateusz joined quite a bit – is too less coverage for a package to be maintained. So this is not due to your feedback being negative, it is not. Just whether something like Manopt.jl would be worth being added.

kellertuer commented 1 month ago

This endet up being quite a rush again. Most tests fail still, but it was merged nevertheless.

I‘ll wait for a more stable version before I add it to Manopt then.

Vaibhavdixit02 commented 1 month ago

Those are Downgrade tests - the failure is completely unrelated to any work here. All tests that were expected to pass here and the example in the docs are all being run on CI and passing, but I'll leave the decision to you.

kellertuer commented 1 month ago

We discussed 2 days ago that I feel this is all in a rush. I do not yet follow most of the code as we discussed here, nor can I reproduce the bugs you claim. Hence I lack a bit of confidence here. That is nothing against your work, just my trust to it.

ChrisRackauckas commented 1 month ago

https://github.com/SciML/Optimization.jl/actions/runs/9359053652/job/25762077271 this wasn't a downgrade test?

Vaibhavdixit02 commented 1 month ago

kellertuer commented 1 month ago

That for me just illustrates that at least these are failing.

However, I do not want to intervene with the common practices in SciML or this package; my approach is different, but the only conclusion I draw from this I already mentioned.