kellertuer commented 2 months ago

This is my (third) summer project: This is initial code for a Riemannian Interior Point Newton method based on the master thesis of @mstokkenes, where of course documentation and testing are not part of master theses, but exploring how this algorithms works and illustrating that is works. A Matlab code for the same idea this is based on (and from the authors of the original paper) is https://github.com/GALVINLAI/RIPM, though here the notation is already much closer to that of Manopt.jl

🛣️ Roadmap

[x] 🛠️ refactor code to VectorialHessianFunctions, which are newer than the original code base
[x] 🛠️ Refactor is_feasible to work in a more general sense than just for RIPM and (re-)introduce DebugFeasibility()
[x] 🛠️ rework KKT vector field, Jacobian and the merit function (incl. documentation)
[x] 🛠️ store the merit function and its gradient as the stepsize_objective and for the additional check use centrality_condition
[x] 🛠️ debug and refactor Conjugate residual
[x] 🔎 carefully check the condensed KKT system
[x] 🔎 carefully debug the parameters ρ and σ (Section 8.1 YL) and their update
[x] ⚠️ for now the IPNewton performs one or two steps very well, then the gradient seems not be a descent direction (but I am also not yet 100% through with documentation and testing)
[x] 📚extend all new documentation strings
[x] fix #403 on all solvers that have a subsolver.
[x] 📈 ensure proper test coverage

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 99.85207% with 1 line in your changes missing coverage. Please review.

Project coverage is 99.78%. Comparing base (21dd646) to head (21535f1).

Files	Patch %	Lines
src/solvers/trust_regions.jl	75.00%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #399 +/- ## ======================================== Coverage 99.77% 99.78% ======================================== Files 72 76 +4 Lines 7592 8217 +625 ======================================== + Hits 7575 8199 +624 - Misses 17 18 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kellertuer commented 1 month ago

@mateuszbaran just if you have time, do you want to take a look? I now even added a 1D-Euclidean example where the gradient of the merit function (=the one used for line search) produces a wrong gradient (see check at the end), but anything I computed by hand indicates that all formula are correct, so I am a bit out of ideas (after about 10 hours of checking handwritten computations against code).

Even found a small bug yesterday during that 10 hours, but it seems that was not the only one. (The only other thing to check is the updates of sigma and rho, but the main thing is to find why the IPNewton does not do the right thing)

I currently suspect

The linear system being wrong ( CondensedKKTVectorFieldJacobian (say A) is the matrix, CondensedKKTVectorField (say b) is minus the right hand side, so what we solve is AX+b = 0
gradient checks of the KKTVEctorFieldNormSq (and its gradient) fail.

All formulae are documented both close to a style from the paper of Lai&Yoshise (adapted variable names) and one nicer to read (only docs missing: Parameters of IPN). I personally am slowly turning both blind and dumb on this code, I just do not see through it anymore.

edit: I found one bug today that fixed the condensedKKTVectorfield.

mateuszbaran commented 1 month ago

Sure, I will take a look. I've tried running test_interior_point_newton_simple.jl (I think this is the right file?) and there are multiple errors but the last check_gradient(N, F, grad_F, q, qY; plot=true, error=:info) actually runs fine, with the exception that the gradient is wrong.

kellertuer commented 1 month ago

Well, that is the simple on (x°2 on the real line with constraint x >= 1) I can quickly check, I had hoped I always copied over the last versions I checked with. Yeah my main worry for now is the F (KKTVEctorFieldSquared – or merit function in the paper)

edit: yeah do not (only half) rename p_0 to p nor st to res. it should run now. Note that the solver currently does 0 iterations since I am mainly checking all possible functions at the start point. If that works I wanted to do a first step and so on.

And F (grad_F) from the merit function (KKTVectorFieldGradNormSq) is the one still failing – but the rest I checked by hand at a few points and I am confident the subsolver (conjugate residual) is working fine.

Not that all structs/functions are already documented fully in our notation. Did that as a first step to check the implementations.

mateuszbaran commented 1 month ago

I think I've fixed it by multiplying by 2?

kellertuer commented 1 month ago

oh :D let me also check the larger example then. As I said, I have checked by now so many things by hand that I was sure, I am not able to see the last trick I am missing (but found two small bugs today still).

kellertuer commented 1 month ago

The larger example (still not too large but at least Riemannian test_interior_point_Newton.jl from my student still has a wrong gradient at least after 5 steps, But I feel it can not be off by far if now the Euclidean one is correct

mateuszbaran commented 1 month ago

I can give you a hint that the error is in the first component of the gradient:

julia> ForwardDiff.gradient(k -> F(N, k), q)
([-5.578785081155688, 0.8210125930525509, -1.1470345086837803], [3.80509088710519, 1.195579403962125, 2.567637759181358], Float64[], [3.850822689745532, 2.5016182248665264, 2.4737559598127277])

julia> grad_F(N, q)
([-5.569136798420136, 1.0900342547693727, -0.8725353489543635], [3.80509088710519, 1.195579403962125, 2.567637759181358], Float64[], [3.850822689745532, 2.5016182248665264, 2.4737559598127277])

kellertuer commented 1 month ago

The Large or the small example? That looks more like the larger one on the sphere – are you sure forward diff then provides a Riemannian gradient?

mateuszbaran commented 1 month ago

The large one, around line 80 in test_interior_point_Newton.jl. ForwardDiff's gradient passes the check_gradient at least.

kellertuer commented 1 month ago

I am a bit surprised by that since that reads like the euclidean gradient but that component is actually on the sphere (there other 3 are on parameters and hence Euclidean.

I was hoping that is correct. but I will start tomorrow checking f and the components of g as well as F and its gradient again. Sadly grad_F is one of the most involved formulae (“fought” through the formula of the condensed KKT and its Jacobian today).

mateuszbaran commented 1 month ago

Actually that also surprised me after some thinking and I've been digging further. When the first component is projected on the sphere to make it Riemannian, actually the same result as KKTVectorFieldNormSqGradient is returned and check_gradient of course fails.

kellertuer commented 1 month ago

Thanks for checking, that seems quite strange, since the norm of the kit vector field is always and only norms of tangents so I am surprised that turns out to be Euclidean correct? For now I do not understand what that means. Maybe that one of the Hessians is wrong (f or g)?

mateuszbaran commented 1 month ago

Ah, I've found it. The check is wrong: the vector X is direction of which we check correctness of gradient is not a tangent vector.

kellertuer commented 1 month ago

Oh, hehe! As I said in the first post, this is quite some lines of code, so I lost a bit the overview after these weeks working on that. thanks for finding that.

mateuszbaran commented 1 month ago

You're welcome :slightly_smiling_face: . Sometimes looking at something from outside helps with finding simple issues.

kellertuer commented 1 month ago

Nice! Now the gradient is correct and the algorithm stays feasible for the first 400 iterations. So hopefully an Armijo and checking the rho/sigma updates fix the rest

kellertuer commented 1 month ago

Yes, I usually try for quite a while myself and was sure, this can only be a strange bug. It was ;)

So, it even converges (starting on the boundary on the right at [0,1,1] (normed back) and the minimiser at north it only takes 5k iterations

to reach that by now. since it does reach the point I would assume the main remaining problem is updating the parameters (barrier and such).

edit: Ah with Armijo instead of constant it is already at about 150. So it seems this was a good next step now.

kellertuer commented 1 month ago

Thanks for the help yesterday, I think I am now really just missing to improve the stopping criterion (probably by implementing the KKT residual) – it now seems to work quite well (besides stopping a bit late).

kellertuer commented 1 month ago

For the example from above, I am surprised we need a very large KKTResidual threshold only to get to this after 65 iterations – but it looks like it now works (the initial point is exactly on the boundary, not outside).

Screenshot 2024-07-30 at 11 52 53

The main thing missing yesterday was, that in the linesearch there is a $\tau$ parameter in the centrality conditions (not yet fully documented) that should be updated during the iterations. That gave the speedup.

So besides documentation (writing the rest and reading everything carefully), this only needs test coverage. Happy to have this working finally.

kellertuer commented 1 month ago

I now carefully documented all functions – found a small bug in one check function – that sadly makes the step sizes a bit more conservative than necessary. Will check that later.

Besides that, there is only 2 or 3 special cases of the high-level interface missing in test coverage (about 20 lines) that should not be much work left.

So I'll set this to ready-to-review already.

kellertuer commented 1 month ago

While Markus did start an interface in manifolds with p::Numer, this seems to be a bit too involved for me at least, since the product manifolds might have to be adapted as well. I'll leave that out until there is a need for this.

So besides maybe also checking the default values for performance – this is finished.

mateuszbaran commented 1 month ago

Sure, IPM with Number points seems mostly useless.

Closed-form subsolver was bothering me a bit so I've implemented #403 in places where it was added in this PR. On master I think only difference-of-convex is using it?

kellertuer commented 1 month ago

Frank-Wolfe is using it as well, the others not, but for consistency it would be nice to have that unified on all sub solvers.

For the number case – exactly, I also feel that is not so much a useful case.

The one thing still bothering me is performance, I can not figure out how to get the centrality condition (within linesearch) set up to produce okay-ish step sizes. Currently they are super conservative (they are just mean to never leave the feasible set). I can easily get them much smaller (1e-430 or so) but not larger. And I think I follow for 100% the paper. Well. Maybe I'll write the authors an email when this is published.

mateuszbaran commented 1 month ago

OK, I will update the other solvers too then.

Did you compare the centrality condition with any existing implementation of IPM? It looks like Euclidean methods should have something very similar.

kellertuer commented 1 month ago

No, I just stumbled upon that today after 4pm when I documented the last doc string (that of the centrality condition).

kellertuer commented 1 month ago

Ah, interesting, just accepting all (centrality_condition= (M,p) -> true) works better than any other check. Very interesting. Will keep that in one test just to further investigate somewhen later.

kellertuer commented 1 month ago

Oh, I have to work a bit more on codecov the next days (deactivated too much debug); but not today, maybe also neither tomorrow nor the day after (was a busy week with work, though I am on vacation actually).

mateuszbaran commented 1 month ago

OK, interesting. I still have a few things I'd like to check here. I think I've finished the closed-form subsolver rework, and even found an issue with a test while doing that.

kellertuer commented 1 month ago

Read a bit about the centrality condition. It is only necessary for global convergence, so locally it might not be necessary. It is also in Lai-Yoshise very similar to the work by El-Bakry et. al. from 1996, https://link.springer.com/article/10.1007/BF02275347, Section 6.

My problem is that they are both a bit vague about “the starting point” ($v_0$ in the paper above). Originally we set this every iteration. That worked okay-ish. But looking at the Matlab code of Lai-Yoshise, they set that to really just the initial values. Which performs worse than setting it every iteration. And not having that condition checked performs better than both.

mateuszbaran commented 1 month ago

I've taken a look at line search approaches for Euclidean IPM and it turned out to be very involved. Take a look at, for example, section 17.4 or chapter 19 in https://link.springer.com/book/10.1007/978-3-031-08720-2 .

Anyway, if you want to check what performs better or worse, it would be good to test it on a larger set of problems.

kellertuer commented 1 month ago

I actually do not want to check that. I would be fine with a reasonable default for now.

mateuszbaran commented 1 month ago

That's perfectly understandable :smile: .

kellertuer commented 1 month ago

I think for now I would just set centrality_condition to either missing or equivalently (M,p) -> true that is to accept all steps and write a note that this ensures convergence locally and for global one an set it to the Centrality condition.

But I will check that in the evening when checking the remaining code coverage as well.

kellertuer commented 1 month ago

I think I am finished with my part including code cov. I kept a new point for the Closed form solver part and its code coverage. That is the only part missing here.

kellertuer commented 1 month ago

I think the only two sub solvers not yet following the new style are

[x] Convex Bundle Method
[x] Proximal Bundle Method
[x] Adaptive Regularisation with Cubics (ARC)
[x] Difference of Convex Proximal Point
[x] Difference of Convex
[x] Trust regions

kellertuer commented 1 month ago

Oh TR looks a bit more involved, the code is quite old and might also benefit from a bit of improvements anyways. Do you want to do that or shall I?

mateuszbaran commented 1 month ago

I can adapt TR.

kellertuer commented 1 month ago

We have all others and proper code cov.

kellertuer commented 1 month ago

Oh, I introduced an ambiguity? Why did that not happen for the other solvers? I do not directly se how to resolve that.

mateuszbaran commented 1 month ago

I've adapted TR and resolved the ambiguity.

kellertuer commented 1 month ago

I am not so sure where we lost the 2 lines on TR, but the rest seems to be fine.

kellertuer commented 1 month ago

Yay! I'd call this finished, but would be happy if you could approve it as well. We can also wait until tomorrow with merging.

mateuszbaran commented 1 month ago

I would like to take one more look at this PR tomorrow, mostly to check if there are any major performance issues or some small documentation improvements.

kellertuer commented 1 month ago

Sure, If you find something and do a commit, you can also adapt the date in the changelog. I can check back here tomorrow afternoon then, since I will be on a hike tomorrow (one we can also do when you are here).

kellertuer commented 1 month ago

Performance is not great but to improve that I need to extend PowerManifold a bit first. I think this would be better suited for a separate PR.

I think s well, this took quite a while to get done, so working is my first goal here, performance a next one. I did work through most of the access functions used for constraints and used the single access ones for performance for example.

JuliaManifolds / Manopt.jl

The Riemannian Interior Point Newton Method #399

🛣️ Roadmap

Codecov Report