`parallel_transport_to` for `ProbabilitySimplex`

adienes commented 1 year ago

Is this just the identity?

I'm by no means a geometer and am probably way off base, but from what I can gather reading some articles about parallel transport, I think since the simplex is flat everywhere can the method be implemented like

parallel_transport_to!(::ProbabilitySimplex, Y, ::Any, X, ::Any) = copyto!(Y, X)

kellertuer commented 1 year ago

That depends a lot on the metric you impose on that set. Could you provide references? My short search did not bring up that many results that I found helpful concerning parallel transport. The one reason I think might indicate that this is not the case, is, that the set is open (i.e. you never reach the boundary, which for me feels quite non-Euclidean.

edit: For example here https://ww3.math.ucla.edu/camreport/cam18-17.pdf they introduce a metric that is not the one we use (but the could add that one for sure) – see Def 5 and that yields a parallel transport that is not just the identity, cf. Proposition 5.

mateuszbaran commented 1 year ago

No, ProbabilitySimplex is not flat (w.r.t Fisher-Rao metric). As a reference, it looks like https://github.com/geomstats/geomstats/blob/master/geomstats/information_geometry/multinomial.py is the same as our ProbabilitySimplex (just with a different scaling) and it says it has positive sectional curvature. I couldn't find any source for PT on that manifold so most likely there is no known closed-form formula.

sethaxen commented 1 year ago

I haven't read the literature on the Fisher-Rao metric, but it seems to be closely related to the Euclidean metric on the sphere. Specifically, let $p \in \mathbb{S}^n$ and let $f: \mathbb{S}^n \to \Delta^n$ be the elementwise square: $q_i = f(p)_i = p_i^2$.

Then the differential of this map $\mathrm{d}f_p: Tp\mathbb{S}^n \to T{f(p)}\Delta^n$ has the property $(X_q)_i = (\mathrm{d}f_p)(X_p)_i = 2 p_i (X_p)_i$. The Fisher-Rao metric is

$$g: (X_q, Yq) \mapsto \sum{i=1}^n \frac{(X_q)_i (Y_q)_i}{q_i} = \langle 2X_p, 2Y_p \rangle_F$$

So the Fisher-Rao metric is equivalent to evaluating the metric on the sphere after mapping each vector to the tangent space to the sphere and scaling them by length 2. This is also why exp is so similar for the two manifolds: (see https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/probabilitysimplex.html#Base.exp-Tuple{ProbabilitySimplex,%20Vararg{Any}} and https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/sphere.html#Base.exp-Tuple{AbstractSphere,%20Vararg{Any}}). They're actually the same form but using $f$ (or its inverse, if we limit the sphere to the positive orthant):

julia> using Manifolds, LinearAlgebra, Random;

julia> M1 = ProbabilitySimplex(10);

julia> M2 = Sphere(10);

julia> q = normalize(randexp(representation_size(M1)), 1);

julia> Xq = project(M1, q, randn(representation_size(M1)));

julia> Xq ./= norm(M1, q, Xq);

julia> f(p) = p.^2;

julia> dfinv(q, Xq) = @. Xq / sqrt(q) / 2;

julia> finv(q) = sqrt.(q);

julia> exp(M1, q, Xq) ≈ f(exp(M2, finv(q), dfinv(q, Xq)))
true

So it seems that similarly parallel transport on ProbabilitySimplex could be implemented using the same relationship to the Sphere.

sethaxen commented 1 year ago

This actually raises a good point I hadn't thought of yet. The uniform measure implemented in #604 presumes the use of the Euclidean metric. A uniform measure presuming the Fisher-Rao metric would be implemented using randn!(rng, q); normalize!(q, 2); q .= q.^2

Edit: I vaguely recall that this is the Dirichlet(fill(1/2, n+1)) distribution.

Edit2: FWIW the measure used in the PR is probably the right one to use, as either the Lebesgue measure or this one (agrees with Lebesgue to within a scalar IIRC) is usually the reference measure on the simplex.

kellertuer commented 1 year ago

Oh I did not think of that either. We could still keep the current one with respect to the metric manifold and EuclideanMetric then, though the Euclidean Metric is not a Riemannian Metric on the probabilities (since exp he is just plus and then you would leave the manifold quite fast).

mateuszbaran commented 1 year ago

Right, I tend to forget which manifolds with Fisher-Rao metric are just transformed (positive orthants of scaled) spheres. We can indeed derive parallel transport this way.

the Euclidean Metric is not a Riemannian Metric on the probabilities (since exp he is just plus and then you would leave the manifold quite fast).

Hm? Probability simplex with neither Fisher-Rao nor Euclidean metric are geodesically complete. And why wouldn't Euclidean metric on the probability simplex be Riemannian?

kellertuer commented 1 year ago

Oh, in my mind Fisher-Rao was complete, but you might be right. Then even more we can keep the current one for the MetricManifold{ℝ,ProbablilitySimplex,EuclideanMetric}

sethaxen commented 1 year ago

I'm not certain if Fisher-Rao is technically geodesically complete. If you follow the curve defined by the geodesic, it either turns or reflects at the boundary (apparent by considering the geodesic on the sphere as it crosses between orthants), so one can follow the curve indefinitely, but it may only be complete if one considers the boundary as part of the manifold.

kellertuer commented 1 year ago

...I think even without the boundary it should be complete. But I can check later for the simplex what happens – we ad least had a few cases with growing step sizes / steps towards the boundary, that were of course w.r.t. Fisher-Rao so that the Euclidean ones of course got shorter and shorter to not reach the boundary.

In that sense I had the simplex always in mind similar to SPDs, when you try to “run” towards an eigenvalue 0.

adienes commented 1 year ago

do you three have a recommended textbook on this subject at the advanced undergraduate / early graduate level? I am definitely missing some context to be able to follow smoothly :)

sethaxen commented 1 year ago

we ad least had a few cases with growing step sizes / steps towards the boundary, that were of course w.r.t. Fisher-Rao so that the Euclidean ones of course got shorter and shorter to not reach the boundary.

In that sense I had the simplex always in mind similar to SPDs, when you try to “run” towards an eigenvalue 0.

No, it really does turn (remember the geodesic on the sphere). It also creates a cycle.

using Plots, Manifold
M = ProbabilitySimplex(2)
p = project(M, ones(representation_size(M)))
X = project(M, p, randn(representation_size(M)))
X ./= norm(M, p, X)
plot([0, 1, 0, 0], [1, 0, 0, 1]; color=:black, label="bounds")
t = 0:0.01:2π
plot!(eachrow(reduce(hcat, geodesic(M, p, X, t))[1:2, :])...; label="geodesic")

tmp

kellertuer commented 1 year ago

Oh, fancy! I wanted to do such a plot as well next, but I got a bit stuck on Caching.

mateuszbaran commented 1 year ago

Great illustration, I think we should have something like that in docs :+1: . So we can either consider probability simplex as a manifold (without boundary, so geodesics just stop) or a manifold with boundary (but some of our methods don't work correctly on the boundary).

kellertuer commented 1 year ago

do you three have a recommended textbook on this subject at the advanced undergraduate / early graduate level? I am definitely missing some context to be able to follow smoothly :)

The main thing to have in mind here is, that a Riemannian manifold especially need an inner product on each of its tangent spaces. And there usually exist several choices that yield a Riemannian manifold. Text book for the general case: Probably Lee (https://link.springer.com/book/10.1007/978-3-319-91755-9) or do Carmo (https://link.springer.com/book/9780817634902); concerning computations either Absil, Mahony Sepulchre (https://press.princeton.edu/absil) or more recent Boumal (https://www.nicolasboumal.net/#book); the last one has a quite soft start with embedded manifolds.

mateuszbaran commented 1 year ago

do you three have a recommended textbook on this subject at the advanced undergraduate / early graduate level? I am definitely missing some context to be able to follow smoothly :)

Ronny's recommendations are good but keep in mind that most textbooks are very heavy on theory, which is usually developed in a different way than Manifolds.jl works (atlases vs embeddings). Atlases are more general but they are often harder to work with (computationally) than embeddings.

kellertuer commented 1 year ago

Ronny's recommendations are good but keep in mind that most textbooks are very heavy on theory, which is usually developed in a different way than Manifolds.jl works (atlases vs embeddings). Atlases are more general but they are often harder to work with (computationally) than embeddings.

That's a main feature of Nicolas' book (the last link) – it is the one getting to numerics the fastest – maybe because Nicolas is the main developer behind Manopt/Matlab :)

edit: the next closest is the Absil, Mahony, Sepulchre – who were also involved in the Matlab package (i.e. the RANSO group in LLN).

mateuszbaran commented 1 year ago

So, a few tips how to derive parallel transport on ProbabilitySimplex with Fisher-Rao metric. Seth has written most needed methods here: https://github.com/JuliaManifolds/Manifolds.jl/issues/606#issuecomment-1546866436 . One additional thing is differential of f: df(p, Xp) = 2 .* p .* Xp. Now, to compute parallel transport on ProbabilitySimplex you can just convert to sphere and back, more or less like this:

function parallel_transport_to!(::ProbabilitySimplex{n}, Y, p, X, q) where {n}
    qS = finv(q)
    parallel_transport_to!(Sphere(n), Y, finv(p), dfinv(p, X), qS)
    Y .= df(qS, Y)
    return Y
end

Now all that is needed is a bit of optimization to avoid allocating unnecessary intermediate values. You can also implement riemann_tensor! in a similar way for ProbabilitySimplex.

JuliaManifolds / Manifolds.jl

`parallel_transport_to` for `ProbabilitySimplex` #606