JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

Rework high level interfaces #392

Closed kellertuer closed 3 days ago

kellertuer commented 2 months ago

This is an approach to fix #381

🛣️ Roadmap

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.90%. Comparing base (c8564b8) to head (949457a). Report is 1 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #392 +/- ## ========================================== + Coverage 99.76% 99.90% +0.13% ========================================== Files 77 77 Lines 8256 8220 -36 ========================================== - Hits 8237 8212 -25 + Misses 19 8 -11 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kellertuer commented 2 months ago

The ::Number ambiguities with types points are all removed.

TCG has some strange ambiguities, that I still have to look at. but all other high-level interfaces do not have ambiguities any longer. The trick was to slightly redefine the methods that “generated” array-functions for number type data.

kellertuer commented 2 months ago

Small update: For now any single of the remaining 6 ambiguities, when trying to fix them, the number only goes further up. I am a bit los on this by now, since trying to reduce ambiguities only increases them (with my limited knowledge).

Some are also due to a deprecated signature so we can maybe leave these out in the end.

Maybe the reasonable solution is to make a breaking release that sets the Hessian to be mandatory, then the user would have to pass the approxHessian themselves.

kellertuer commented 2 months ago

The code with mandatory Hessian is so much cleaner. So this will be part of 0.5.0 – so nothing urgent and I will wait for a few other issues to be resolved and check whether we have other breaking changes we should consider / include then.

kellertuer commented 2 months ago

Oh tests might be a bit tricky, because we first have to bump version on ManoptExamples.jl it seems for the tests.

kellertuer commented 2 months ago

This is mainly waiting for https://github.com/JuliaManifolds/ManoptExamples.jl/pull/16 and is (again) not urgent. So we can also think a bit about what else “to break” to improve.

kellertuer commented 2 months ago

Oh ManoptExamples only works with Julia 1.8, so I would either have to raise the Julia version here as well or run the tests with manoptExamples only on Julia 1.8 but not on 1.6

mateuszbaran commented 2 months ago

Maybe we should make ManoptExamples.jl compatible with Julia 1.6? What's the issue?

kellertuer commented 2 months ago

That would maybe also be an idea, there was something with range.

mateuszbaran commented 2 months ago

Yes, you just need to replace

range(interval[1], interval[2], n)

with

range(interval[1], interval[2]; length=n)
kellertuer commented 2 months ago

Ah will check; probably the best way to loosen the constraints to 1.6 again once I fixed that (maybe I was a bit too eager to finish the weak deps). I am travelling tomorrow so I might only find time on Thursday for that then.

kellertuer commented 2 months ago

Short status update here: With ManoptExamples 0.1.8 tests should be fine again. I also removed all deprecated keywords, but it seems I missed something on reworking the interfaces somewhere since now the cache (some StorageRef?) Fails since it suddenly gets Floats where it expects Vector{Float}s. No clue why and I thought I had that resolved (so I hope it is not due to removing deprecated stuff).

But well. This might hence still take quite a while I feel. but we also can wait until we have more breaking stuff.

kellertuer commented 2 months ago

I think I at least narrowed down the error: now the p we store internally is always mutating (that is on circle and positive numbers it is internally also a [p] if p is a number, but the point_storage win the storage/cache gets initialised to p somethow, so that is numbers – and then conversion fails.

Since this is now consistent in all high level interfaces, we might have to check a few places where internally this is accidentally not the case (that points are always vectors in structures of Manopt)

mateuszbaran commented 2 weeks ago

Since this is a breaking change then maybe it would be a good idea to also tackle #400 here? Especially implementation of the factory pattern.

kellertuer commented 2 weeks ago

Then this becomes a monster-PR again, and I was hoping to tackle that even in a non-breaking way, but sure, we can also do this here :)

If so I would also check for other constructors that might have the same like a step-size or so.

mateuszbaran commented 2 weeks ago

The adjustments necessary for step sizes don't look that complicated to me, especially if we don't have to make them non-breaking.

kellertuer commented 2 weeks ago

No, also the other ones for the gradient rules should not be that complicated and doable nonbreaking I think. That was why I thought we could to that after this PR, but I can check (when I have finished the points still open) how much we can address that here as well, sure.

kellertuer commented 2 weeks ago

I looked at it and I agree – in a breaking way it should be easy to do and also much easier than trying to do it non-breaking. Will add it here, sure. Just that this way the PR might take a bit longer, given that my semester starts next week (and I have resit-exams to administer this week).

mateuszbaran commented 2 weeks ago

Great! I think it is fine to delay a breaking release to resolve that issue with line searches.

kellertuer commented 2 weeks ago

I needed a small trick – the tutorials have to render for this version also on the dev version from the folder, otherwise they would run on 0.4; we could maybe return to normal after the breaking release.

kellertuer commented 2 weeks ago

I fell into a small rabbit hole.

Now we have for states with sulbsolvers the constructors

To make this easier, p= was moved to a keyword. So then I did the same with all other states.

But with this, the constructors are very clear now, though it was a bit of a refactoring. I am thinking about not exporting the states to declare them a bit more internal.

This finishes the main intention of this PR. so only the factory-idea is left here (and code coverage).

kellertuer commented 2 weeks ago

I just worked on a first sketch for the factory idea. I would like some feedback on whether this is what you want @bvdmitri, and from @mateuszbaran whether this is still nice memory wise (I hope so).

The idea is as follows. The factory just wraps the direction update T all args and kwargs; I even added a manifold if one wants to, see

https://github.com/JuliaManifolds/Manopt.jl/blob/3147fb4a77ec64d5c658c45a1edd90e16a7ea50c/src/plans/gradient_plan.jl#L272-L323

(oh the docs are still outdated; the actual type is only a parameter no longer a field)

Then the actual call is either just factory() (works if the internal manifold from before is not nothing) or factory(M) that then passes all args/kwargs to the actual type. There is also a produce_rule that accepts both rules and factories but makes sure the result is a rule

With that both

gradient_descent(SomeManifold(), f, g, p; direction = MomentumGradient())
gradient_descent(SomeManifold(), f, g, p; direction = MomentumGradientRule(M, p))

work, but since the factory also accepts the p as a positional argument also the old version

gradient_descent(SomeManifold(), f, g, p; direction = MomentumGradient(M,p))

works and even

gradient_descent(SomeManifold(), f, g, p; direction = MomentumGradient(p))

(so you can fill any args/kwargs. The last one I like because if you have memory/a point you want to pass you do not longer have to think of the manifold either.

besides that, I feel since p can always be filled with rand(M) I might move that even to a keyword argument in the (more internal) XRule types. The current X types will then always create such a factory.

Let me know

a) what you think (before I rework all other rules and b) wherelse we could provide such a factory pattern

If the factory is ok in its form, it might even work for nearly any other “thing” in Manopt, but my first idea would be

maybe also Debug/Record things, but since they can already usually addressed with symbols (cf. https://manoptjl.org/stable/plans/debug/#Manopt.DebugActionFactory-Tuple{Symbol}), maybe they are already fine.

edit: Since all try to follow the same idea “delay the use of a manifold to fill defaults in args and kwargs” one could even cover that with the factory above – and maybe name it a bit more general DelayedManifoldDefaultsFactory or so.

bvdmitri commented 2 weeks ago

Nice work, I’m currently on vacation, but will be able to provide some feedback next week when return to the office :)

kellertuer commented 2 weeks ago

Cool, enjoy your vacation!

kellertuer commented 2 weeks ago

Since the fix was slightly breaking, I also added the fix for the remaining 2 points in #382 here. Now both the full matrix and the limited memory variant are equivalent in how they perform their updates.

kellertuer commented 1 week ago

Sure, I was just asking about feedback on the factory, the rest is quite a bit of rework because we noticed Aqua was not too happy with a few ambiguities. The reason both are on this PR is, since that will be the next (breaking) version that. So no worries, I was not asking you to review the rest – just the idea of the factory.

Nice. that you like it, so I will continue introducing that to the other areas as well and also use that to review the docs.

kellertuer commented 1 week ago

Concerning hidden bugs, the Average gradient had one, where in some cases the history was not filled correctly and hence it was much closer to gradient descent than averaging. This already has been fixed.

For the rest, sure I'll let you know all the refactoring to the new factory has been finished; probably the main work is even the docs and not the code itself.

kellertuer commented 1 week ago

@bvdmitri thanks for the feedback, I addressed the structural things already; I generalised the factory also in name to a ManifoldDefaultsFactory so it can be used with other things as well. The docs are not yet perfect, but I also have to work through the remaining direction rules, stopping Criteria, and Stepsizes. and will do that during that as well.

An idea is that there is usually an internal type (like the XRules now) that documents fields and internal parts and is also not exported. Compared to that the new X (exported) will document the mathematical part and have the new Note to the internal one with the factory.

Since this involves a few types to rework, this will still take a while, but I plan to maybe refactor a type per day (besides teaching that started today again).

bvdmitri commented 1 week ago

Great amount work @kellertuer ! nicely done, the documentation is indeed important, but the implementation is solid IMO

kellertuer commented 1 week ago

Thanks. As I said I will then work through the changes for some time, but I feel it improves usability quite a bit :)

kellertuer commented 1 week ago

Small update, all direction rules should now work fine with the factory, that is one can now always write

conjugate_gradient_descent(M, f, grad_f; coefficient=FletcherReevesCoefficient())

but one can still overwrite any argument of keyword argument when calling these (still without using a manifold) for example the threshold here can be set factory = ConjugateGradientBealeRestart(; threshold=0.2) – but its vector transport it would need is still only filled with a default later when one passes a manifold to the factory(M) this call creates. This way one can easily still overwrite keywords and even arguments to coefficients or also for the

AverageGradient(; n=20)

this still works fine for all keywords of any of these rules. And test coverage should also by now work. So the only thing on this PR left is to check whether it is reasonable to do this for step sizes (probably) and stopping criteria (not 100% sure, not many need a manifold I think), too. So that we have Armijo() as a factory for ArmijoLinesearch and such; the factory is already generic enough for this, it's mainly a matter of slight adaption of code and a bit larger adaption of documentation.

kellertuer commented 1 week ago

This now maybe also closes #406, #407, #408 (all Manifolds bumps for the package, docs and tutorials).

kellertuer commented 1 week ago

@mateuszbaran I do not seem able here to get RecursiveArrayTools (now a weak dependency here as well) to work fine with Statistics. Any ideas?

Today was rally not my day. My progress over several hours can bee strictly upper bounded by zero; and locally all works fine.

edit: and now I do not know why it works now. Well,...

mateuszbaran commented 6 days ago

Ah, I see you've figured it out already.

kellertuer commented 6 days ago

My main problem is, I neither know what I actually did wrong nor how I fixed it (in basically total confusion). But it works now and for this PR I am even only 2 stepsize rules aways from finishing all rework.

kellertuer commented 6 days ago

I think this is good to go (was a bit more work than I thought) – and turned out quite nice I think

@bvdmitri could you maybe check whether the new docs that provide the new interface are understandable? These are

If you have time, maybe give this new version a test run – if not, that is of course also fine

@mateuszbaran if you have time – the high level interfaces have now been reworked, so there are no warnings from Aqua any longer, especially no ambiguities, start points p are set to rand as default last positional argument whenever reasonable. Throughout I hope this is now consistent – and at least Aqua is happy :)

I would like to merge this maybe towards the end of the week, so if you have time to comment on these changes, that would be great – I think this still works as intended; I hope I have documented all breaking changes. Overall this breaking change is mainly unification of documentation and keyword names and improvements on Usability, since M is less often needed.

bvdmitri commented 5 days ago

I noticed some bad renders in the docs, but otherwise seems good to me :) I also noticed that math doesn't render nicely very often in many docstrings, not sure why is this the case...

Screenshot 2024-08-27 at 08 18 21 Screenshot 2024-08-27 at 08 19 54 Screenshot 2024-08-27 at 08 21 18 Screenshot 2024-08-27 at 08 21 49

Here the English is broken IMO

Screenshot 2024-08-27 at 08 19 16

How about

This function generates a `ManifoldDefaultsFactory`for `ConstantStepsize`.
The manifold specification is optional; if not provided, the manifold will be automatically determined from the optimization procedure. This affects all arguments and keyword arguments with defaults that depend on the manifold, unless explicitly specified here.

Also sometimes its inconsistent as here

Screenshot 2024-08-27 at 08 28 13

(also in other places, e.g. Polyak and PolyakStepsize)

But just let me reiterate that this is a great amount of work you did and its quite impressive!

kellertuer commented 5 days ago

Ah, the info box is a function (with that type as a variable, exactly because I want to just fix the English in one place ;) and maybe I set that in copy. paste not correctly. Will check – also your other findings; thanks for checking!

edit: And thanks again for the nice feedback – that does still not excuse me being unconcentrated when doing all this copy-pase-work – I should probably read the docs one more time myself as well. The typos you found are already fixed :)

kellertuer commented 3 days ago

I might consider in a later PR to also introduce the factory at least to some stopping criteria (StopWhenChangeLess or StopWhenGradientChangeLess or such), but that would also require StopWhenAny and StopWhenAll to be able to handle those, so it is maybe a bit more work – should be doable nonbreaking still I think.

thanks again for all the ideas and contributions!