Closed kellertuer closed 1 month ago
Attention: Patch coverage is 99.85207%
with 1 line
in your changes missing coverage. Please review.
Project coverage is 99.78%. Comparing base (
21dd646
) to head (21535f1
).
Files | Patch % | Lines |
---|---|---|
src/solvers/trust_regions.jl | 75.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@mateuszbaran just if you have time, do you want to take a look? I now even added a 1D-Euclidean example where the gradient of the merit function (=the one used for line search) produces a wrong gradient (see check at the end), but anything I computed by hand indicates that all formula are correct, so I am a bit out of ideas (after about 10 hours of checking handwritten computations against code).
Even found a small bug yesterday during that 10 hours, but it seems that was not the only one. (The only other thing to check is the updates of sigma and rho, but the main thing is to find why the IPNewton does not do the right thing)
I currently suspect
CondensedKKTVectorFieldJacobian
(say A) is the matrix, CondensedKKTVectorField
(say b) is minus the right hand side, so what we solve is AX+b = 0All formulae are documented both close to a style from the paper of Lai&Yoshise (adapted variable names) and one nicer to read (only docs missing: Parameters of IPN). I personally am slowly turning both blind and dumb on this code, I just do not see through it anymore.
edit: I found one bug today that fixed the condensedKKTVectorfield.
Sure, I will take a look. I've tried running test_interior_point_newton_simple.jl
(I think this is the right file?) and there are multiple errors but the last check_gradient(N, F, grad_F, q, qY; plot=true, error=:info)
actually runs fine, with the exception that the gradient is wrong.
Well, that is the simple on (xยฐ2 on the real line with constraint x >= 1) I can quickly check, I had hoped I always copied over the last versions I checked with. Yeah my main worry for now is the F (KKTVEctorFieldSquared โ or merit function in the paper)
edit: yeah do not (only half) rename p_0
to p
nor st
to res
. it should run now. Note that the solver currently does 0 iterations since I am mainly checking all possible functions at the start point. If that works I wanted to do a first step and so on.
And F (grad_F) from the merit function (KKTVectorFieldGradNormSq) is the one still failing โ but the rest I checked by hand at a few points and I am confident the subsolver (conjugate residual) is working fine.
Not that all structs/functions are already documented fully in our notation. Did that as a first step to check the implementations.
I think I've fixed it by multiplying by 2?
oh :D let me also check the larger example then. As I said, I have checked by now so many things by hand that I was sure, I am not able to see the last trick I am missing (but found two small bugs today still).
The larger example (still not too large but at least Riemannian test_interior_point_Newton.jl
from my student still has a wrong gradient at least after 5 steps, But I feel it can not be off by far if now the Euclidean one is correct
I can give you a hint that the error is in the first component of the gradient:
julia> ForwardDiff.gradient(k -> F(N, k), q)
([-5.578785081155688, 0.8210125930525509, -1.1470345086837803], [3.80509088710519, 1.195579403962125, 2.567637759181358], Float64[], [3.850822689745532, 2.5016182248665264, 2.4737559598127277])
julia> grad_F(N, q)
([-5.569136798420136, 1.0900342547693727, -0.8725353489543635], [3.80509088710519, 1.195579403962125, 2.567637759181358], Float64[], [3.850822689745532, 2.5016182248665264, 2.4737559598127277])
The Large or the small example? That looks more like the larger one on the sphere โ are you sure forward diff then provides a Riemannian gradient?
The large one, around line 80 in test_interior_point_Newton.jl. ForwardDiff's gradient passes the check_gradient
at least.
I am a bit surprised by that since that reads like the euclidean gradient but that component is actually on the sphere (there other 3 are on parameters and hence Euclidean.
I was hoping that is correct. but I will start tomorrow checking f and the components of g as well as F and its gradient again. Sadly grad_F is one of the most involved formulae (โfoughtโ through the formula of the condensed KKT and its Jacobian today).
Actually that also surprised me after some thinking and I've been digging further. When the first component is projected on the sphere to make it Riemannian, actually the same result as KKTVectorFieldNormSqGradient
is returned and check_gradient
of course fails.
Thanks for checking, that seems quite strange, since the norm of the kit vector field is always and only norms of tangents so I am surprised that turns out to be Euclidean correct? For now I do not understand what that means. Maybe that one of the Hessians is wrong (f or g)?
Ah, I've found it. The check is wrong: the vector X
is direction of which we check correctness of gradient is not a tangent vector.
Oh, hehe! As I said in the first post, this is quite some lines of code, so I lost a bit the overview after these weeks working on that. thanks for finding that.
You're welcome :slightly_smiling_face: . Sometimes looking at something from outside helps with finding simple issues.
Nice! Now the gradient is correct and the algorithm stays feasible for the first 400 iterations. So hopefully an Armijo and checking the rho/sigma updates fix the rest
Yes, I usually try for quite a while myself and was sure, this can only be a strange bug. It was ;)
So, it even converges (starting on the boundary on the right at [0,1,1] (normed back) and the minimiser at north it only takes 5k iterations
to reach that by now. since it does reach the point I would assume the main remaining problem is updating the parameters (barrier and such).
edit: Ah with Armijo instead of constant it is already at about 150. So it seems this was a good next step now.
Thanks for the help yesterday, I think I am now really just missing to improve the stopping criterion (probably by implementing the KKT residual) โ it now seems to work quite well (besides stopping a bit late).
For the example from above, I am surprised we need a very large KKTResidual threshold only to get to this after 65 iterations โ but it looks like it now works (the initial point is exactly on the boundary, not outside).
The main thing missing yesterday was, that in the linesearch there is a $\tau$ parameter in the centrality conditions (not yet fully documented) that should be updated during the iterations. That gave the speedup.
So besides documentation (writing the rest and reading everything carefully), this only needs test coverage. Happy to have this working finally.
I now carefully documented all functions โ found a small bug in one check function โ that sadly makes the step sizes a bit more conservative than necessary. Will check that later.
Besides that, there is only 2 or 3 special cases of the high-level interface missing in test coverage (about 20 lines) that should not be much work left.
So I'll set this to ready-to-review already.
While Markus did start an interface in manifolds with p::Numer
, this seems to be a bit too involved for me at least, since the product manifolds might have to be adapted as well. I'll leave that out until there is a need for this.
So besides maybe also checking the default values for performance โ this is finished.
Sure, IPM with Number
points seems mostly useless.
Closed-form subsolver was bothering me a bit so I've implemented #403 in places where it was added in this PR. On master I think only difference-of-convex is using it?
Frank-Wolfe is using it as well, the others not, but for consistency it would be nice to have that unified on all sub solvers.
For the number case โ exactly, I also feel that is not so much a useful case.
The one thing still bothering me is performance, I can not figure out how to get the centrality condition (within linesearch) set up to produce okay-ish step sizes.
Currently they are super conservative (they are just mean to never leave the feasible set).
I can easily get them much smaller (1e-430
or so) but not larger. And I think I follow for 100% the paper. Well.
Maybe I'll write the authors an email when this is published.
OK, I will update the other solvers too then.
Did you compare the centrality condition with any existing implementation of IPM? It looks like Euclidean methods should have something very similar.
No, I just stumbled upon that today after 4pm when I documented the last doc string (that of the centrality condition).
Ah, interesting, just accepting all (centrality_condition= (M,p) -> true
) works better than any other check. Very interesting. Will keep that in one test just to further investigate somewhen later.
Oh, I have to work a bit more on codecov the next days (deactivated too much debug); but not today, maybe also neither tomorrow nor the day after (was a busy week with work, though I am on vacation actually).
OK, interesting. I still have a few things I'd like to check here. I think I've finished the closed-form subsolver rework, and even found an issue with a test while doing that.
Read a bit about the centrality condition. It is only necessary for global convergence, so locally it might not be necessary. It is also in Lai-Yoshise very similar to the work by El-Bakry et. al. from 1996, https://link.springer.com/article/10.1007/BF02275347, Section 6.
My problem is that they are both a bit vague about โthe starting pointโ ($v_0$ in the paper above). Originally we set this every iteration. That worked okay-ish. But looking at the Matlab code of Lai-Yoshise, they set that to really just the initial values. Which performs worse than setting it every iteration. And not having that condition checked performs better than both.
I've taken a look at line search approaches for Euclidean IPM and it turned out to be very involved. Take a look at, for example, section 17.4 or chapter 19 in https://link.springer.com/book/10.1007/978-3-031-08720-2 .
Anyway, if you want to check what performs better or worse, it would be good to test it on a larger set of problems.
I actually do not want to check that. I would be fine with a reasonable default for now.
That's perfectly understandable :smile: .
I think for now I would just set centrality_condition
to either missing
or equivalently (M,p) -> true
that is to accept all steps and write a note that this ensures convergence locally and for global one an set it to the Centrality condition.
But I will check that in the evening when checking the remaining code coverage as well.
I think I am finished with my part including code cov. I kept a new point for the Closed form solver part and its code coverage. That is the only part missing here.
I think the only two sub solvers not yet following the new style are
Oh TR looks a bit more involved, the code is quite old and might also benefit from a bit of improvements anyways. Do you want to do that or shall I?
I can adapt TR.
We have all others and proper code cov.
Oh, I introduced an ambiguity? Why did that not happen for the other solvers? I do not directly se how to resolve that.
I've adapted TR and resolved the ambiguity.
I am not so sure where we lost the 2 lines on TR, but the rest seems to be fine.
Yay! I'd call this finished, but would be happy if you could approve it as well. We can also wait until tomorrow with merging.
I would like to take one more look at this PR tomorrow, mostly to check if there are any major performance issues or some small documentation improvements.
Sure, If you find something and do a commit, you can also adapt the date in the changelog. I can check back here tomorrow afternoon then, since I will be on a hike tomorrow (one we can also do when you are here).
Performance is not great but to improve that I need to extend
PowerManifold
a bit first. I think this would be better suited for a separate PR.
I think s well, this took quite a while to get done, so working is my first goal here, performance a next one. I did work through most of the access functions used for constraints and used the single access ones for performance for example.
This is my (third) summer project: This is initial code for a Riemannian Interior Point Newton method based on the master thesis of @mstokkenes, where of course documentation and testing are not part of master theses, but exploring how this algorithms works and illustrating that is works. A Matlab code for the same idea this is based on (and from the authors of the original paper) is https://github.com/GALVINLAI/RIPM, though here the notation is already much closer to that of Manopt.jl
๐ฃ๏ธ Roadmap
VectorialHessianFunction
s, which are newer than the original code baseis_feasible
to work in a more general sense than just for RIPM and (re-)introduceDebugFeasibility()
stepsize_objective
and for the additional check usecentrality_condition
ฯ
andฯ
(Section 8.1 YL) and their update