JuliaManifolds / ManifoldsBase.jl

Basic interface for manifolds in Julia
https://juliamanifolds.github.io/ManifoldsBase.jl/
MIT License
87 stars 8 forks source link

Fused retractions #146

Closed mateuszbaran closed 1 year ago

mateuszbaran commented 1 year ago

Here I'm working on https://github.com/JuliaManifolds/Manifolds.jl/issues/569 .

kellertuer commented 1 year ago

Looking forward to seeing how that turns out. I hope we had good ideas to both improve performance and keeping it nonbreaking :)

mateuszbaran commented 1 year ago

I've worked a bit on this today and retract_pade(M::AbstractManifold, p, X, ::Number) makes me confused due to the two numbers at the end. Or one. It's hard to notice that it actually is well-defined. I think it would be much more clear if we didn't extract number from PadeRetraction there and just passed the PadeRetraction object.

kellertuer commented 1 year ago

It is just one number, since the pade-retraction has an approximation order n that we extract, when we move from level 2 (dispatch on retraction type) to level 3 (retraction-type functions like retract_pade.

But sure if you want to go for fusing stuff, then it would have two arguments that are numbers in the end n::Int, t::Real. Note that currently retract_caley defaults to n=1.

Can you elaborate on the argument why that would not be well-defined? Is it ok if the fix n to be integer?

My argument for the current form ist that retract_pade that has a PadeRetraction{n}() as its last (or with fusion second to last) argument doubles the information that it is a pade-retraction. It also makes the interplay with Caley more complicated. Also the type really just has an integer in its parameters. If possible I would like to keep the n there as a last (non-fused) or second-to-last (fused) argument.

mateuszbaran commented 1 year ago

But sure if you want to go for fusing stuff, then it would have two arguments that are numbers in the end n::Int, t::Real. Note that currently retract_caley defaults to n=1.

Why that order? Right now we have t right after X in exp and retract, before retraction method.

Can you elaborate on the argument why that would not be well-defined? Is it ok if the fix n to be integer?

As long as n doesn't have a default value it is well-defined. If it did have a default value for n, we couldn't be sure if the user wanted to specify n or t by an integer after X. We have retraction methods as optional arguments so expecting n to maybe have a default value is something I had done until I realized it couldn't work with optional t as well.

My argument for the current form ist that retract_pade that has a PadeRetraction{n}() as its last (or with fusion second to last) argument doubles the information that it is a pade-retraction. It also makes the interplay with Caley more complicated. Also the type really just has an integer in its parameters. If possible I would like to keep the n there as a last (non-fused) or second-to-last (fused) argument.

Note that we have different expectations of what retract_pade(M, p, X, 2, 3) is supposed to mean. retract_pade(M, p, X, 2, PadeRetraction(3)) (or retract_pade(M, p, X, PadeRetraction(3), 2)) would make the intention clear. We can use a different type instead of PadeRetraction if you prefer (like Val(3)), just something that is not <:Number.

kellertuer commented 1 year ago

But sure if you want to go for fusing stuff, then it would have two arguments that are numbers in the end n::Int, t::Real. Note that currently retract_caley defaults to n=1.

Why that order? Right now we have t right after X in exp and retract, before retraction method.

I do not have a strong preference here, I was just thinking: It's added at the end – but you are right it's “at the end but before the retraction” and since n specifies the retraction your idea of doing t, n is also very reasonable. Let's go with your choice then, since mine was really only to add t at the end (which is not what we do). Or maybe not, see next point.

Can you elaborate on the argument why that would not be well-defined? Is it ok if the fix n to be integer?

As long as n doesn't have a default value it is well-defined. If it did have a default value for n, we couldn't be sure if the user wanted to specify n or t by an integer after X. We have retraction methods as optional arguments so expecting n to maybe have a default value is something I had done until I realized it couldn't work with optional t as well.

This might actually be an argument to putting t last, since I do not expect n to have a default value, it is a mandatory argument of the retract_pade and only t has a default value. It is mandatory, since it is always a given parameter of PadeRetraction.

My argument for the current form ist that retract_pade that has a PadeRetraction{n}() as its last (or with fusion second to last) argument doubles the information that it is a pade-retraction. It also makes the interplay with Caley more complicated. Also the type really just has an integer in its parameters. If possible I would like to keep the n there as a last (non-fused) or second-to-last (fused) argument.

Note that we have different expectations of what retract_pade(M, p, X, 2, 3) is supposed to mean. retract_pade(M, p, X, 2, PadeRetraction(3)) (or retract_pade(M, p, X, PadeRetraction(3), 2)) would make the intention clear. We can use a different type instead of PadeRetraction if you prefer (like Val(3)), just something that is not <:Number.

For me, the order is clearer after reading this. the n should be the 2 is your first notation, since the t is last in the retract_typeofretraction() functions in level 3. the n for me is a mandatory parameter to I am still for doing

retract_pade(M, p, X, n, t=1.0) where only the t is optional. t can be real or integer, n is always integer. And I think with a proper documentation there is no need to clarify this in the documentation, so I do not see the need to overcomplicate things by introducing types here. edit: We also do not have types for p or X at this point, but their intention is well-documented. In that sense we can also handle the n which is mandatory.

mateuszbaran commented 1 year ago

Sorry but after reading your response I'm only more confident that we do need a type wrapper for n there -- it's just too easy to make a mistake by switching the order of t and n, and you suggest an order that is inconsistent with retract.

kellertuer commented 1 year ago

Hm, I am still not that convinced, in principle Pole and Schield do the same with the number of steps (and even the NLSolve, though only for the inverse). My main argument is that we also do not type p and X and in the same manner for retract_pade the n is a mandatory integer variable added before the t. For example the embedded retraction actually needs certain type, there we have that, but for n I only see that we can restrict it to integer. Still, since it is mandatory, I think there is no confusion (same as the thing that we can not confuse X (on the circle for example) with t. And sure this is only for retract_pade on level 3, I am not speaking about Level 1 or 2 where we keep the type for sure.

to be precise, the signatures we have are (in my model)

in practice these are not exported and not called by a user directly, so the only point where this has an effect is implementing these. There I would prefer to keep it in this simplicity maybe? On the other hand with your argument we also would have to type q,p,X because they could be confused in order?

mateuszbaran commented 1 year ago

Hm, I am still not that convinced, in principle Pole and Schield do the same with the number of steps (and even the NLSolve, though only for the inverse).

Pole and Schild are for vector transport IIRC, not retractions?

in practice these are not exported and not called by a user directly, so the only point where this has an effect is implementing these. There I would prefer to keep it in this simplicity maybe? On the other hand with your argument we also would have to type q,p,X because they could be confused in order?

Our entire interface is designed around not constraining q, p and X so the possibility of confusing them is something we just have to accept. I've made the error multiple times, so this design choice isn't something I particularly like. Until IIRC Julia 1.5 it was also motivated by performance but that's no longer an issue.

If you really hate my suggestion then I won't push on this more but I won't like this interface either.

kellertuer commented 1 year ago

Hm, I am still not that convinced, in principle Pole and Schield do the same with the number of steps (and even the NLSolve, though only for the inverse).

Pole and Schild are for vector transport IIRC, not retractions?

A yes mixed them up, we called them pole_ladder instead of vector_transport_to_pole or something but basically they have the same thing.

in practice these are not exported and not called by a user directly, so the only point where this has an effect is implementing these. There I would prefer to keep it in this simplicity maybe? On the other hand with your argument we also would have to type q,p,X because they could be confused in order?

Our entire interface is designed around not constraining q, p and X so the possibility of confusing them is something we just have to accept. I've made the error multiple times, so this design choice isn't something I particularly like. Until IIRC Julia 1.5 it was also motivated by performance but that's no longer an issue.

If you really hate my suggestion then I won't push on this more but I won't like this interface either.

First of all I like the untyped p,q,X since it allows for arbitrary things to be used with the packages, for example Arrays or StaticArrays without having to introduce wrapper types like MPointArray or something. I feel that might lead to overtyping and too many end users having to think about introducing the right types instead of things “just working” (if used in right order). Sure this is not a black-white situation but a spectrum, but I feel the current state is a good mix. Don't you? And since we have a very consistent order of arguments, I think the confusion should not be so large. And I think the same about the n.

Though in the end, I mean for this PR, we just have to decide on one thing. I do not have a strong opinion here, so if you feel that a type for the n or passing down the retraction is really necessary, I would prefer passing down the retraction type, since that avoids introducing yet-another-type. For me personally n here as the oder of pade has a quite concrete meaning so just keeping it an integer would be my preferred way.

IN short: My preferred way is to not type and keep the n as in the signatures I wrote above, but if you feel a type is necessary since two numbers as arguments might be confusing (I very much think that is not the case), then let's pass down the PadeRetraction.

mateuszbaran commented 1 year ago

Sure this is not a black-white situation but a spectrum, but I feel the current state is a good mix. Don't you? And since we have a very consistent order of arguments, I think the confusion should not be so large. And I think the same about the n.

I generally like our interface, I just wanted to point out that there are some drawbacks of not wrapping things. There are also drawbacks of wrapping things but a single layer of wrapping is rarely an issue in my experience.

Part of the issue with retract_pade is that its order or arguments is inconsistent with retract. Of course order of arguments for two different functions doesn't have to be consistent but they are somewhat related.

IN short: My preferred way is to not type and keep the n as in the signatures I wrote above, but if you feel a type is necessary since two numbers as arguments might be confusing (I very much think that is not the case), then let's pass down the PadeRetraction.

OK, I think I will pass down PadeRetraction then.

kellertuer commented 1 year ago

Sure this is not a black-white situation but a spectrum, but I feel the current state is a good mix. Don't you? And since we have a very consistent order of arguments, I think the confusion should not be so large. And I think the same about the n.

I generally like our interface, I just wanted to point out that there are some drawbacks of not wrapping things. There are also drawbacks of wrapping things but a single layer of wrapping is rarely an issue in my experience.

Ok, I feel wrapping might be less usable for an end-user who just has arrays and wants to use those. So that is not a computational drawback but a usability one, that is might feel more complicated.

Part of the issue with retract_pade is that its order or arguments is inconsistent with retract. Of course order of arguments for two different functions doesn't have to be consistent but they are somewhat related.

Could you elaborate a bit on that since I feel if “retraction goes last if it is there” and “t is before that if it is there” are the assumptions we do, then this is consistent.

IN short: My preferred way is to not type and keep the n as in the signatures I wrote above, but if you feel a type is necessary since two numbers as arguments might be confusing (I very much think that is not the case), then let's pass down the PadeRetraction.

OK, I think I will pass down PadeRetraction then.

Ok, as I said not my favourite one, but one I am ok with, we have the embedded_retraction which does that as well because it needs a retraction there, so that is not really inconsistent. And actually one could dispatch on PadeRetraction{1} if needed for Caley? Also nice.

mateuszbaran commented 1 year ago

Sure this is not a black-white situation but a spectrum, but I feel the current state is a good mix. Don't you? And since we have a very consistent order of arguments, I think the confusion should not be so large. And I think the same about the n.

I generally like our interface, I just wanted to point out that there are some drawbacks of not wrapping things. There are also drawbacks of wrapping things but a single layer of wrapping is rarely an issue in my experience.

Ok, I feel wrapping might be less usable for an end-user who just has arrays and wants to use those. So that is not a computational drawback but a usability one, that is might feel more complicated.

Sure, I think we should change that part of our interface.

Part of the issue with retract_pade is that its order or arguments is inconsistent with retract. Of course order of arguments for two different functions doesn't have to be consistent but they are somewhat related.

Could you elaborate a bit on that since I feel if “retraction goes last if it is there” and “t is before that if it is there” are the assumptions we do, then this is consistent.

Yes, these are the assumptions I have in mind but in your suggestion retraction-related information (n) goes before t.

And actually one could dispatch on PadeRetraction{1} if needed for Caley? Also nice.

Yes, begin able to dispatch on n is a small benefit of not unwrapping, so we don't even need retract_caley then (which has a typo BTW...)

kellertuer commented 1 year ago

Sure this is not a black-white situation but a spectrum, but I feel the current state is a good mix. Don't you? And since we have a very consistent order of arguments, I think the confusion should not be so large. And I think the same about the n.

I generally like our interface, I just wanted to point out that there are some drawbacks of not wrapping things. There are also drawbacks of wrapping things but a single layer of wrapping is rarely an issue in my experience.

Ok, I feel wrapping might be less usable for an end-user who just has arrays and wants to use those. So that is not a computational drawback but a usability one, that is might feel more complicated.

Sure, I think we should change that part of our interface.

Part of the issue with retract_pade is that its order or arguments is inconsistent with retract. Of course order of arguments for two different functions doesn't have to be consistent but they are somewhat related.

Could you elaborate a bit on that since I feel if “retraction goes last if it is there” and “t is before that if it is there” are the assumptions we do, then this is consistent.

Yes, these are the assumptions I have in mind but in your suggestion retraction-related information (n) goes before t.

Retraction (type) goes last, not retraction information – but well, as I said let's keep the retraction type also for the dispatch argument.

And actually one could dispatch on PadeRetraction{1} if needed for Caley? Also nice.

Yes, begin able to dispatch on n is a small benefit of not unwrapping, so we don't even need retract_caley then (which has a typo BTW...)

Oh! In deed, let's deprecate the typo one then. I think retract_caley should be kept, though merely for a user who has caley in mind, wants to overwrite that without knowing/thinking about that it is a special case of pade. So, keeping caley is mainly for the users convenience again (that it is another name for pade(1)).

mateuszbaran commented 1 year ago

I've changed DefaultManifold to store size in a field to make it do less compilation. Actually very little can be optimized by statically knowing the dimension.

mateuszbaran commented 1 year ago

By the way, I'm up to about Stiefel in updating Manifolds.jl to these changes. I will make a PR to Manifolds.jl when tests pass locally.

codecov[bot] commented 1 year ago

Codecov Report

Merging #146 (aef5a0a) into master (65e9263) will decrease coverage by 0.04%. The diff coverage is 100.00%.

:exclamation: Current head aef5a0a differs from pull request most recent head d60387a. Consider uploading reports for the commit d60387a to get more accurate results

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   99.87%   99.84%   -0.04%     
==========================================
  Files          19       19              
  Lines        2464     2522      +58     
==========================================
+ Hits         2461     2518      +57     
- Misses          3        4       +1     
Impacted Files Coverage Δ
src/DefaultManifold.jl 100.00% <100.00%> (ø)
src/ManifoldsBase.jl 98.75% <100.00%> (-0.80%) :arrow_down:
src/PowerManifold.jl 99.81% <100.00%> (-0.19%) :arrow_down:
src/decorator_trait.jl 100.00% <100.00%> (ø)
src/exp_log_geo.jl 100.00% <100.00%> (ø)
src/nested_trait.jl 100.00% <100.00%> (ø)
src/point_vector_fallbacks.jl 100.00% <100.00%> (+0.74%) :arrow_up:
src/retractions.jl 100.00% <100.00%> (ø)
... and 3 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

kellertuer commented 1 year ago

If you say that thats easier to handle, sure the DefaultManifold can be changed.

Nice, I assume it is a bit of work to get the new versions fast?

mateuszbaran commented 1 year ago

Nice, I assume it is a bit of work to get the new versions fast?

Most of the time it's either easy or there doesn't seem to be much room for improvement over just multiplying the vector by t. At the moment the biggest problem is that I'm not sure how to multiply StiefelFactorization by t. Yes, it's defined as an MPoint but it's also used to represent tangent vectors.

kellertuer commented 1 year ago

I can check tomorrow, but I thinking's mainly some caching (maybe for Cholesky?).

kellertuer commented 1 year ago

Oh yes that is quite unfortunate that the StiefelFactorization is an MPoint but also used as a tangent vector, it should probably just be struct? At first sight, I do not see direct how the multiplication with t works here and would have to check the ZimmermannHüper paper more closely.

mateuszbaran commented 1 year ago

Yes, I'd prefer StiefelFactorization to just be a struct. I'd expect the scaling to be just multiplication of the Z part by t but I'm not fully sure.

kellertuer commented 1 year ago

Then lets just make it a struct and as far as I see it might indeed just be the Z, yes.

mateuszbaran commented 1 year ago

I just discovered that #134 made isapprox quite slow -- something like 35 microseconds when actual isapprox on arrays takes about 60 ns. My last commit got it down to about 1.2 us but to make it acceptable I need to change the interface of function check_approx(M::AbstractManifold, p, q; kwargs...) a bit -- we can't have it generate a string when all we need is a yes/no response. Do you have a preferred way to handle this @kellertuer ?

mateuszbaran commented 1 year ago

BTW, I discovered the issue when I was investigating why statistics.jl tests in Manifolds.jl take so long to finish (about 20 minutes on my computer). A large part of that was calling isapprox in median calculation.

kellertuer commented 1 year ago

I just discovered that #134 made isapprox quite slow -- something like 35 microseconds when actual isapprox on arrays takes about 60 ns. My last commit got it down to about 1.2 us but to make it acceptable I need to change the interface of function check_approx(M::AbstractManifold, p, q; kwargs...) a bit -- we can't have it generate a string when all we need is a yes/no response. Do you have a preferred way to handle this @kellertuer ?

I am often amazed what you find :) Since that is an internal function sure we can change it. But to my (though untrained) eye, the new shortcut-return should already the point that no string is generated?

I do not have a preferred way of handling this, no, if you have a good idea, feel free to propose it here.

mateuszbaran commented 1 year ago

I am often amazed what you find :)

Thanks. I can show you how I do it if you want :). It's not that complicated.

Since that is an internal function sure we can change it. But to my (though untrained) eye, the new shortcut-return should already the point that no string is generated?

This shortcut skips creating one string, but check_approx(M::AbstractManifold, p, q; kwargs...) still makes one.

I'm not sure what the best solution here is. Currently manifolds implement isapprox ignoring the new interface. We probably need the fast yes/no implementations to just coexist next to the ones providing errors. The logic is going to be doubled but I feel that the cost of not doubling it and providing full functionality is too large.

is_point/check_point functions may have a similar issue but they are less likely to occur in a performance-critical code.

kellertuer commented 1 year ago

I thought we even introduced the check_ functions to split something already.

I am not so much a fan of doubling, but sure if that is necessary for performance, then let‘s do that. If it is not too much work, I would prefer doing the same for point/vector as well, because that would be consistent.

mateuszbaran commented 1 year ago

There wouldn't be that much doubling in isapprox because it has few methods and they are relatively simple. Doing something similar for is_point/is_vector would be much more doubling though, and the speed gains would be less important. So I'm not sure it's a good idea at this time.

kellertuer commented 1 year ago

That‘s why I phrased it carefully, then the inconsistency is ok and we just adapt isapprox, of course.

mateuszbaran commented 1 year ago

Manifolds.jl tests pass for me locally now with these changes, so I think you can start looking at this :slightly_smiling_face: . I will still investigate a bit more if these are all the changes we need at the moment, since it's breaking, and will make a short list of changes.

kellertuer commented 1 year ago

Will check the fusion soon, for now just 2 comments on the change in isapprox.

mateuszbaran commented 1 year ago

I've added a changelog, what do you think?

kellertuer commented 1 year ago

I've added a changelog, what do you think?

Looks good, thanks for starting this. We should do this in the other packages as well, at least for breaking changes/releases. And the changelog contains the one criticism I see in the nicer, faster variants – the user has to think of this and implement a “slightly more complicated version”. Should we adapt the tutorial to that? Se https://juliamanifolds.github.io/ManifoldsBase.jl/stable/example.html#manifold-tutorial-fn and maybe even the technical note before?

mateuszbaran commented 1 year ago

We should do this in the other packages as well, at least for breaking changes/releases.

Yes, that would be nice. At least this is a start :slightly_smiling_face: .

And the changelog contains the one criticism I see in the nicer, faster variants – the user has to think of this and implement a “slightly more complicated version”. Should we adapt the tutorial to that? Se https://juliamanifolds.github.io/ManifoldsBase.jl/stable/example.html#manifold-tutorial-fn and maybe even the technical note before?

Sure, I forgot about it. I will modify the tutorial.

mateuszbaran commented 1 year ago

I think everything looks fine now and this can be merged :slightly_smiling_face: .