An Interface and first steps towards a Riemannian Gradient and a Riemannian Hessian

kellertuer commented 4 years ago

This thread -- as a continuation of the discussion started in JuliaManifolds/Manifolds.jl#27 -- shall collect ideas and approaches to the Riemannian Gradient (for example employing tools like the egrad2rgrad from Manopt) and the Riemannian Hessian. Both should be accompanied by a thorough theoretical documentation. I am not completely sure how far this can be generalized to get gradients of arbitrary functions, but let's see how far we get in this discussion.

kellertuer commented 4 years ago

A first approach that is possible, are Differentials of the exponential map, the distance function, the log function and geodesics using Jacobi fields, see for example doi:10.3389/fams.2018.00059 – of course we did not invent that and the computation of Jacobi fields employed in there is limited to symmetric manifolds, but there it can be given in closed form (which is actually the reason for the tangent_orthonormal_basis. Also their adjoint differentials can easily be computed. I used that to compute (merely by recursion for exactly that case) the gradient of a function F that mapped a set of points through geodesics and a distance to a real number (Bezier curves as you might have guessed from the paper title).

If functions are built from above mentioned blocks, I am sure this can be automated (differentials and their adjoints to compute the gradient) and if the manifold is not symmetric on can still solve the Jacobi fields by ODEs.

Note that in the above paper we were a little lazy and did not distinguish between tangent and cotangent vectors, I would prefer to do that in such an implementation.

The hessian is for example given by taking such a gradient field and take the (Riemannian) derivative f that field along a next direction (indicated by another cotangent vector). While I am not 100% sure whether that can be computed as canonical as above mentioned gradients, it would be nice to know how much one can do there automatically.

If the manifold is isometrically embedded, both the Riemannian gradient and the hessian can be derived using the (Euclidean) gradient and Hessian in the embedding. That's where Manopt and pymanopt have their egrad2rgrad and ehess2rhess for. Maybe also along these lines (for nicely embedded manifolds) one can something with AD by using standard AD and afterwards the aforementioned transforms from (py)Manopt.

mateuszbaran commented 4 years ago

There is quite a lot to digest in that paper, though I'd assume that ideally one would for example just AD through the geodesic function and apply egrad2rgrad to the result? I have to go back to my earlier experiments and look at it.

kellertuer commented 4 years ago

Yes, it get's little technical in that paper; Look at Eq. 9, that's the Chain rule on manifolds. If you look at (10), that's the differential of a geodesic with respect to its starting point and finally Sec 4.2 (the beginning is enough) is the adjoint Jacobi fields (they put a few $\circ$ to $^\circ$s there in the HTML version, but, well) including the idea to get from h(F(x)) (F: M->M, h: M->R), its differential (chain rule) to the gradient. That all works completely without egrad2rgrad there. But if h is not just a distance but something where you need that – that would also be fine.

mateuszbaran commented 4 years ago

I was thinking a bit about AD and correct me if I'm wrong but I think at least in two simple cases we can use standard AD here: derivative of a manifold-valued curve and function gradients.

First, derivative of $f: R -> M$ at $t$. I don't have a formal proof but shouldn't it be equal to the derivative of $log(f(t), f(t+u))$ with respect to $u$ at $t=t, u=0$? At least in the most common case of isometrically embedded tangent spaces.

For the gradient of $f: M -> R$ at $p \in M$, we can do a different trick. Let $e_1, e_2, ... e_n$ be the orthonormal basis of $T_p M$ and $g(h) = \sum_i h_i * e_i$ for $h = (h_1, h_2, ..., h_n)$. Now, the function $k(h) = f(exp_p(g(h)))$ should be nice enough for standard AD, so we get coefficients for the basis vectors from gradient of $k$ at zero.

Does this look right, @kellertuer ?

kellertuer commented 4 years ago

First, derivative of $f: R -> M$ at $t$. I don't have a formal proof but shouldn't it be equal to the derivative of $log(f(t), f(t+u))$ with respect to $u$ at $t=t, u=0$? At least in the most common case of isometrically embedded tangent spaces. That would nearly be the same as a forward difference? I am not sure whether I get your notation correctly so let me try to rephrase what I understood:

Given f: R -> M and a point t (in R), where we want to get the (covariant) deriative at as well as a step size h. We need two function evaluations, a=f(t) and b=f(t+h)

Then the derivative f'(t) can be approximated by 1/h * log(M,a,b). I think you might have missed the division by the step size (that's u at your notation?).

For the gradient of $f: M -> R$ at $p \in M$, we can do a different trick. Let $e_1, e_2, ... e_n$ be the orthonormal basis of $T_p M$ and $g(h) = \sum_i h_i * e_i$ for $h = (h_1, h_2, ..., h_n)$. Now, the function $k(h) = f(exp_p(g(h)))$ should be nice enough for standard AD, so we get coefficients for the basis vectors from gradient of $k$ at zero.

I am not sure about this, basically for notation reasons. the gradient is the Riesz representer of the differential, and with your notation I can get Df(p)[h_1e_1], where h contains the step sizes (in the d directionson M) and e_1,...,e_n is the tangent ONB in TpM. But how did you get from this differential to the Riesz Representer?

mateuszbaran commented 4 years ago

Given f: R -> M and a point t (in R), where we want to get the (covariant) deriative at as well as a step size h. We need two function evaluations, a=f(t) and b=f(t+h)

Then the derivative f'(t) can be approximated by 1/h * log(M,a,b). I think you might have missed the division by the step size (that's u at your notation?).

Well, AD doesn't need two evaluations. That's one of the great things about it. If you apply L'Hôpital's rule to $lim{h -> 0} 1/h * log(M,a,b)$ you get $lim{h -> 0} d/dh log(M,a,b)$ and you get what I've written (assuming the derivative is continuous at 0 etc.). $d/dh log(M,a,b)$ will be calculated using AD.

I am not sure about this, basically for notation reasons. the gradient is the Riesz representer of the differential, and with your notation I can get Df(p)[h_1e_1], where h contains the step sizes (in the d directionson M) and e_1,...,e_n is the tangent ONB in TpM. But how did you get from this differential to the Riesz Representer?

I'll try to make it more formal, in that post I've just thrown a general idea that looks promising. It seems like exp and log and orthonormal basis should be enough to avoid writing egrad2rgrad etc. in many cases.

kellertuer commented 4 years ago

Ah, I did not think about going to L'Hospital – yes. And I can compute the derivative of the log (hm, until now only available in for example in the MVIRT Matlab toolbox, but I want to transfer that anyways, that's also why I wanted the tangent_orthonormal_basis (see SPDs) and why that includes curvature information, you need for d/dh log, if I see that correctly).

I hope that for several cases we can avoid e...2r... functions, yes; Looking forward to reading more :)

mateuszbaran commented 4 years ago

Well, I do not need the derivative of log per se, $d/dh \log(f(t), f(t+h))$ at $h=0$ is a perfectly fine function for standard AD. To be clear, I think this should work:

using ForwardDiff
M = Sphere(2)
f(t) = some point on M
df(t) = ForwardDiff.derivative(h -> log(M, f(t), f(t+h)), 0)

and df(t) should be the right tangent vector representation for the derivative of f at t.

kellertuer commented 4 years ago

I think in my mind there is something missing, either a differential of the log (but maybe I got you wrong there) or the h, but I see, that might be taken care of by forward-diff. Then this should work (for real-valued functions defined on manifolds), since basically the log replaces the (finite) difference.

mateuszbaran commented 4 years ago

Hmm... there might be a small semantic issue. For a function f: M -> R some places define the gradient of f at p \in M as an element of T_pM and some as an element of T_p*M (cotangent space), althogh both variants would be represented in a computer in the same form (as you said, the covector/linear form would be stored as its Riesz representer). So for the discussion here we could limit ourselves to tangent vectors.

Similarly to my previous code,

using ForwardDiff
M = Sphere(2)
f(p) = some combination of coordinates of p
df(p) = ForwardDiff.gradient(h -> f(exp(M, p, project_tangent(M, p, h))), zero_tangent_vector(M, p))

should, I think, return the tangent vector corresponding to the direction of steepest ascent of $f$. Again, f(exp(M, p, project_tangent(M, p, h))) linearizes f around p, in the sense that h is supposed to belong to the tangent space at p (project_tangent ensures that we get a real tangent vector there). This all hinges on the assumption that tangent spaces are represented by linear subspaces of R^n. We use exp and log to linearize f and then treat everything as a black box for ForwardDiff, Zygote or something else. Finite differences would also work.

mateuszbaran commented 4 years ago

BTW, Exp/log could also be replaced with retraction and inverse retraction. Correct behavior up to first order should be enough.

I'm not quite sure how to turn it into some theorems. They would probably be very boring from the geometric perspective anyway, I'm doing my best to sidestep the actual differential geometry here.

mateuszbaran commented 4 years ago

If all you have is a hammer, everything looks like a nail and here I have the boring Euclidean AD...

kellertuer commented 4 years ago

The gradient is always a tangent vector; the differential always yields a cotangent.

I would avoid using project here, since only tangent vectors are allowed for the last argumentof the exp (or retraction, you're right) anyways.

mateuszbaran commented 4 years ago

I would avoid using project here, since only tangent vectors are allowed for the last argumentof the exp (or retraction, you're right) anyways.

Projection is required exactly because only tangent vectors are allowed there. AD doesn't know that and I think this is the simplest way to put this constraint. I've used the orthonormal basis a few posts above instead of projection so that's also an option (although a more convoluted one).

kellertuer commented 4 years ago

I don't feel well with this approach, since you can enter that computation with anything (i.e. nontangents) and would neither notice that nor whether the computation is errornous. For me that feels to much like you just hope everything is right and you can barely check with all the projections in between (and by the way egrad2rgrad is in most cases nothing then a tangent space projection).

Although the only alternative I see is to extend AD to TVector and CoTVector(for their duals) types to make that sure?

sethaxen commented 4 years ago

The gradient is always a tangent vector; the differential always yields a cotangent.

My differential geometry is not good, but isn't the gradient in fact a cotangent vector?

mateuszbaran commented 4 years ago

I don't feel well with this approach, since you can enter that computation with anything (i.e. nontangents) and would neither notice that nor whether the computation is errornous.

Projection is performed on h which is not exposed to the user. The projection is used just to only consider valid arguments for exp. We can do that in any way.

For me that feels to much like you just hope everything is right and you can barely check with all the projections in between

There is only one projection that isn't even necessary. It's true that there is more hope than solid evidence at the moment but as I see it, to have AD we can either work out the details in the direction I've sketched or write an AD system from scratch.

(and by the way egrad2rgrad is in most cases nothing then a tangent space projection).

It's a bit more complicated in case of, for example, Lie groups that represent tangent vectors in the tangent space at identity or vector bundles. These are two examples that I keep in mind. BTW, in which case egrad2rgrad is just a tangent space projection?

mateuszbaran commented 4 years ago

Well, I just noticed that my method of calculating gradients is just an application of the definition of the directional derivative over the whole basis of the tangent space at a point. It's just using exp(t*v) as the curve from the definition.

kellertuer commented 4 years ago

Ok, also with the question of @sethaxen – let's make all this a little more rigorous and look at the book of e.g. Udriste (1994):

Definition 1.1 in Ch. 1 (p. 2): we denote by g a metric, i.e. g_x(X,Y) is a metric on TxM (x on M) acting on tangent vectors X,Y from TxM
The differential df(x) is a cotangent vector (not that precisely written in Udriste, but see for example here, Def. 2.4; it is a linear functional on the tangent space.
the gradient $\nabla f(x)$ is now the tangent vector that fulfils $g_x( X, \nabla f(x) ) = df(x)[X]$ for all X from TxM (cf. Sec. 5 from the last link).

While I am ok with the decomposition into an ONB, I prefer not to use projection. what's the difference? Let's look at the Sphere(2): Theh mentioned above that carries the coefficients of v = h[1]e1 + h[2]e2 for the ONB e1, e2 is from R2 (but v,e1,e2 are from R3) that is a decomposition of v not a projection. But I prefer using h since the dimensions are correct.

And yes you'Re right its then just using exp basically. But I still fear we have to be careful, what we plug in and what we get out. We should carefully model that such that only tangents are possible. With the primal and duals that ForwardDiff uses, I would prefer to use tangents and cotangents similarly as they do with primal and duals.

mateuszbaran commented 4 years ago

I think I know how to make this more formal and more general. I'll write it down and send you when it's ready.

mateuszbaran commented 4 years ago

After working on more carefully (see attachment) it turned out that everything can be justified except the projection thing. You were right about it. Do you see any mistakes in my description?

We should carefully model that such that only tangents are possible. With the primal and duals that ForwardDiff uses, I would prefer to use tangents and cotangents similarly as they do with primal and duals.

That's a valid concern, right now I'm only considering tangent vectors for simplicity. We need to think about where and how cotangent vectors are used. Riemannian_AD.pdf

kellertuer commented 4 years ago

Thanks for the details, it's always good to write down precisely what you mean/intend.

from (2) to (3) you actually do a finite difference approximation (or the generalisation of that actually since you have a log and not a difference) – and the approximation is quite good for small vectors v.
I would prefer to do the steps with TpM keeping in mind that you can have a vector representation of v= h_1e_1+...+h_ne_n for an ONB E={e_1,...,en} of TpM (and similarly for T{f(p)}N) but that might be seen as a more specific version of your embeddings.
Concerning (5), yes df(p) is a linear map and hence (5) is.
(6) is (similar to (3) ) and approximation
(7) and (8) look quite similar since U_M is R^\hat m, but one could do a (9) with the ONB approach above (still yields something quite similar but might be easier to see since its closer to df than hat df).

Concerning the last remark – in this notation one has just to make sure, that the vectors in U_M are not too long, since the dimension is already correct, it is locally (around zero on U_M) fine.

mateuszbaran commented 4 years ago

from (2) to (3) you actually do a finite difference approximation (or the generalisation of that actually since you have a log and not a difference) – and the approximation is quite good for small vectors v.

Right, I'll correct it. I only need that for very small v so it should be fine.

I would prefer to do the steps with TpM keeping in mind that you can have a vector representation of v= h_1_e_1+...+h_n_e_n for an ONB E={e_1,...,en} of TpM (and similarly for T{f(p)}N) but that might be seen as a more specific version of your embeddings.

Hmm, yes, that might be easier to read, I'll try it. By the way, we don't use the orthonormality assumption anywhere, right? I ask because it might sometimes be easier to obtain a non-orthonormal basis for TpM in practice.

(7) and (8) look quite similar since U_M is R^\hat m, but one could do a (9) with the ONB approach above (still yields something quite similar but might be easier to see since its closer to df than hat df).

Concerning the last remark – in this notation one has just to make sure, that the vectors in U_M are not too long, since the dimension is already correct, it is locally (around zero on U_M) fine.

Right, good point. I'll change that. Thanks for feedback!

kellertuer commented 4 years ago

Right, I'll correct it. I only need that for very small v so it should be fine.

Yes, what you are doing in the end is fine, I am just criticising that it looks like (2) and (3) are equal, while in fact (3) approximates (2) – and for small v that is still good enough

Hmm, yes, that might be easier to read, I'll try it. By the way, we don't use the orthonormality assumption anywhere, right? I ask because it might sometimes be easier to obtain a non-orthonormal basis for TpM in practice.

I would not drop that, since with Gram Schmidt you can get an ONB easily and then the decomposition is easier then for non ON Bs

mateuszbaran commented 4 years ago

I've made some edits to reflect that (3) is an approximation and use ONBs everywhere. I've also introduced \tilde{\mathrm{d}f_p} to denote the function that will actually be implemented. I'm wondering now where I should expand on ONB of TpM itself. The part that I actually need are Eqs. (7) and (8) since I can implement these two function on a computer and calculate their Jacobians at (0,0,...,0) to get \hat{\mathrm{d}f_p}.

Here is the LaTeX source in case you wanted to do some edits:

\documentclass{article}
\usepackage[utf8]{inputenc}

\usepackage{amsmath}
\usepackage{amssymb}

\begin{document}

Let $M$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{m}}$ by $h_M\colon M \to \mathbb{R}^{\hat{m}}$ and $N$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{n}}$ by $h_N\colon N \to \mathbb{R}^{\hat{n}}$.
Let $p$ be a point on $M$ with a neigborhood $M_{p} \subseteq M$.
Let $f \colon M_{p} \to N$ be a $C^1$ map.
The embeddings are not necessarily isometric but they must be at least $C^1$ and invertible in a neighborhood of $p$ (or, respectively, $f(p)$.

The differential $\mathrm{d} f_p \colon T_p M \to T_{f(p)} N$ is a function such that for any curve $\gamma \colon \mathbb{R} \to M$, $\gamma(0) = p$, $\gamma'(0) = v$, $\gamma$ corresponds to the tangent vector $v\in T_{p}M$, we have
\begin{equation}
\label{eq:differential}
    \mathrm{d}f_p(\gamma'(0)) = (f \circ \gamma)'(0).
\end{equation}
For example $\gamma(t) = \exp_p(tv)$. Thus
\begin{equation}
\label{eq:differential-2}
    \mathrm{d}f_p(v) = \frac{\mathrm{d}}{\mathrm{d} t}f(\exp_p(tv))(0).
\end{equation}
This can also be approximated
\begin{equation}
\label{eq:differential-3}
    \mathrm{d}f_p(v) \approx \log_{f(p)}( f(\exp_p(v)))
\end{equation}
for small vectors $v$. For sufficiently small vectors $v$ all functions are defined.

We can represent the function $f$ as a mapping between subsets of the spaces $M_p$ and $N$ are embedded in, $\hat{f} \colon h_M^{-1}(M_p) \to \mathbb{R}^{\hat{n}}$:
\begin{equation}
    \hat{f}(x) = h_N(f(h_M^{-1}(x)))
\end{equation}
for any $x\in h_M(M_p)$.

Similarly, let us assume that $T_p M$ is embedded by $h_{T_p M}$ as a linear subspace $U_M$ of $\mathbb{R}^{\hat{m}}$ and $T_{f(p)} N$ is embedded by $h_{T_{f(p)} N}$ as a linear subspace $U_N$ of $\mathbb{R}^{\hat{n}}$.
Using these, we can represent the differential $\mathrm{d}f_p$ in the embedding by $\hat{\mathrm{d}f_p} \colon U_M \to U_N$:
\begin{equation}
    \hat{\mathrm{d}f_p}(v) = h_{T_{f(p)} N}(\mathrm{d}f_p (h_{T_p M}^{-1}(v)))
\end{equation}
for any $v \in U_M$.
In this setting $\hat{\mathrm{d}f_p}$ is just a linear transformation between two vector subspaces. It is thus completely determined by, for example, its values on a basis of $U_M$.

Substituting everything we get
\begin{equation}
    \label{eq:lots-of-circs}
    \hat{\mathrm{d}f_p}(v) \approx \tilde{\mathrm{d}f_p}(v) = (h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1} \circ \hat{f}\circ h_M \circ \exp_p \circ h_{T_p M}^{-1})(v)
\end{equation}
which might look useless until we notice that we can calculate values of $h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1}$, $\hat{f}$ and $h_M \circ \exp_p \circ h_{T_p M}^{-1}$ easily in our computer programs.

One way forward now is to put different vectors $v$ to Eq.~\eqref{eq:lots-of-circs} and see what is returned. We could, however, take an orthonormal basis $v_1, v_2, \dots, v_m$ of $U_M$, define
\begin{equation}
    g(t_1, t_2, \dots, t_m) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^m t_i v_i\right)
\end{equation}
and calculate Jacobian of $g$ at zeros using automatic differentiation to get an easy method of computing $\hat{\mathrm{d}f_p}(v)$.

Alternatively, we could take an orthonormal basis $v_1, v_2, \dots, v_{\hat{m}}$ of our embedding of $T_p M$ in $\mathbb{R}^{\hat{m}}$, define
\begin{equation}
    g_2(t_1, t_2, \dots, t_{\hat{m}}) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^{\hat{m}} t_i v_i\right)
\end{equation}
and calculate Jacobian of $g_2$. As long as the it is given a vector from $U_M$ the expected result will be returned, although care must be taken to avoid giving $g_2$ coefficients $t_i$ that do not correspond to a vector from $U_M$.

\end{document}

mateuszbaran commented 4 years ago

I keep thinking about this problem and I hope we can find a solution that doesn't involve ONBs of tangent spaces. Computing them would be prohibitively expensive for high-dimensional manifolds.

I've looked what pymanopt does and I don't see any justification that these projections in egrad2rgrad result in actual Riemannian gradients and not just some approximations that are usually good enough for optimization.

mateuszbaran commented 4 years ago

I've made a few changes to the text that hopefully provide a good enough justification for the projection I've originally proposed:

\documentclass{article}
\usepackage[utf8]{inputenc}

\usepackage{amsmath}
\usepackage{amssymb}

\begin{document}

Let $M$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{m}}$ by $h_M\colon M \to \mathbb{R}^{\hat{m}}$ and $N$ be a Riemannian manifold embedded in $\mathbb{R}^{\hat{n}}$ by $h_N\colon N \to \mathbb{R}^{\hat{n}}$.
Let $p$ be a point on $M$ with a neigborhood $M_{p} \subseteq M$.
Let $f \colon M_{p} \to N$ be a $C^1$ map.
The embeddings are not necessarily isometric but they must be at least $C^1$ and invertible in a neighborhood of $p$ (or, respectively, $f(p)$.

The differential $\mathrm{d} f_p \colon T_p M \to T_{f(p)} N$ is a function such that for any curve $\gamma \colon \mathbb{R} \to M$, $\gamma(0) = p$, $\gamma'(0) = v$, $\gamma$ corresponds to the tangent vector $v\in T_{p}M$, we have
\begin{equation}
\label{eq:differential}
    \mathrm{d}f_p(\gamma'(0)) = (f \circ \gamma)'(0).
\end{equation}
For example $\gamma(t) = \exp_p(tv)$. Thus
\begin{equation}
\label{eq:differential-2}
    \mathrm{d}f_p(v) = \frac{\mathrm{d}}{\mathrm{d} t}f(\exp_p(tv))(0).
\end{equation}
This can also be approximated
\begin{equation}
\label{eq:differential-3}
    \mathrm{d}f_p(v) \approx \log_{f(p)}( f(\exp_p(v)))
\end{equation}
for small vectors $v$. For sufficiently small vectors $v$ all functions are defined.

We can represent the function $f$ as a mapping between subsets of the spaces $M_p$ and $N$ are embedded in, $\hat{f} \colon h_M^{-1}(M_p) \to \mathbb{R}^{\hat{n}}$:
\begin{equation}
    \hat{f}(x) = h_N(f(h_M^{-1}(x)))
\end{equation}
for any $x\in h_M(M_p)$.

Similarly, let us assume that $T_p M$ is embedded by $h_{T_p M}$ as a linear subspace $U_M$ of $\mathbb{R}^{\hat{m}}$ and $T_{f(p)} N$ is embedded by $h_{T_{f(p)} N}$ as a linear subspace $U_N$ of $\mathbb{R}^{\hat{n}}$.
Using these, we can represent the differential $\mathrm{d}f_p$ in the embedding by $\hat{\mathrm{d}f_p} \colon U_M \to U_N$:
\begin{equation}
    \hat{\mathrm{d}f_p}(v) = h_{T_{f(p)} N}(\mathrm{d}f_p (h_{T_p M}^{-1}(v)))
\end{equation}
for any $v \in U_M$.

Let $\hat{\mathrm{d}f_{p,\Pi}}$ be an extension of $\hat{\mathrm{d}f_p}$ to the entire space $\mathbb{R}^{\hat{m}}$ given by
\begin{equation}
    \hat{\mathrm{d}f_{p,\Pi}}(v) = \hat{\mathrm{d}f_{p}}(\Pi_{U_M}(v))
\end{equation}
where $v\in \mathbb{R}^{\hat{m}}$ and $\Pi_{U_M}\colon \mathbb{R}^{\hat{m}} \to U_M$ is the orthogonal projection onto the subspace $U_M$.
In this setting $\hat{\mathrm{d}f_{p,\Pi}}$ is just a linear transformation between two vector spaces. It is thus completely determined by, for example, its values on a basis of $\mathbb{R}^{\hat{m}}$.

Substituting everything we get
\begin{equation}
    \label{eq:lots-of-circs}
    \hat{\mathrm{d}f_{p,\Pi}}(v) \approx \tilde{\mathrm{d}f_p}(v) = (h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1} \circ \hat{f}\circ h_M \circ \exp_p \circ h_{T_p M}^{-1} \circ \Pi_{U_M})(v)
\end{equation}
which might look useless until we notice that we can calculate values of $h_{T_{f(p)} N}\circ \log_{f(p)}\circ h_N^{-1}$, $\hat{f}$, $h_M \circ \exp_p \circ h_{T_p M}^{-1}$ and $\Pi_{U_M}$ easily in our computer programs.

One way forward now is to put different vectors $v$ to Eq.~\eqref{eq:lots-of-circs} and see what is returned. We could, however, take an orthonormal basis $v_1, v_2, \dots, v_{\hat{m}}$ of $\mathbb{R}^{\hat{m}}$, define
\begin{equation}
    g(t_1, t_2, \dots, t_{\hat{m}}) = \tilde{\mathrm{d}f_p}\left(\sum_{i=1}^{v_{\hat{m}}} t_i v_i\right)
\end{equation}
and calculate Jacobian of $g$ at zeros using automatic differentiation to get an easy method of computing $\hat{\mathrm{d}f_{p,\Pi}}(v)$ which is equal to $\hat{\mathrm{d}f_p}(v)$ on the subspace $U_M$.

\end{document}

Maybe it's not perfect and there will be some numerical issues but it looks to me more solid than what pymanopt does.

mateuszbaran commented 2 years ago

@kellertuer Would this paper be helpful for designing the interface for Riemannian Hessians? LInk: https://arxiv.org/abs/2009.10159 .

kellertuer commented 2 years ago

Looks interesting, though starts in the embedding as well. But the quotient part looks cool once we have quotients (which I want to tackle when our reorg is done).

JuliaManifolds / ManifoldDiff.jl

An Interface and first steps towards a Riemannian Gradient and a Riemannian Hessian #27