Towards differentials on SO(n)

const-ae commented 2 years ago

I am working on some problem where I would like to implement geodesic regression on Rotation manifolds (SO(n)). Following the geodesic regression tutorial in manopt, I dove into learning more about differentials of matrix exponentials and read the paper by Rentmeesters (2011) who explains how to form the orthogonal normalizing basis of the Jacobi operator for SO(3).

However, trying to implement his approach, I don't seem to be able to get the differential right:

function difficult_problem(S)
    Y = random_point(S)
    p = random_point(S)
    v = random_tangent(S, p)
    dir = random_tangent(S, p)
    M = TangentSpace(S, p)

    cost_fnc(arg) = 0.5 * sum((Y - exp(S, p, arg)).^2)
    diff_fnc(arg, v) = -(Y - exp(S, p, arg)) .* differential_exp_argument(S, p, arg, v)
    return check_diff(M, cost_fnc, diff_fnc, p = dir, v = v)
end

difficult_problem(Euclidean(5))
difficult_problem(Sphere(3))
difficult_problem(Rotations(3))

In the above code example I am using check_diff function that tests if the Taylor expansion decreases quadrattically following section 4.8 of Boumal (2021).

using Manifolds, Manopt, Random, LinearAlgebra
import Manifolds.embed!
import Manifolds.mul!
import Manifolds.get_basis
using Plots

function get_basis(M::Rotations{3}, p, B::DiagonalizingOrthonormalBasis{ℝ})
    if manifold_dimension(M) != 3
        error("get_basis is only defined for Rotations(3).")
    end
    decomp = schur(B.frame_direction)
    beta = decomp.Schur[3,2]
    U = decomp.vectors
    e1, e2, e3 = [[1,0,0], [0,1,0], [0,0,1]]
    println(sum((B.frame_direction - U * (beta .* (e3 * e2' - e2 * e3') * U')).^2))
    # eigenvectors
    Ξ = [U * (e3 * e2' - e2 * e3') * U',
         U * (e2 * e1' - e1 * e2') * U',
         U * (e3 * e1' - e1 * e3') * U']
    # corresponding eigenvalues
    κ = [0,beta^2 / 4,beta^2 / 4]
    return CachedBasis(B, κ, Ξ)
end

function embed!(::Rotations, Y, p, X)
    return mul!(Y, p, X)
end

function check_diff(M::AbstractManifold, cost_fnc, diff_fnc; p = random_point(M), v = random_tangent(M, p),
      log_space_range = range(-8,0, length = 51))

    f0 = cost_fnc(p)
    df0 = sum(diff_fnc(p, embed(M, p, v)))

    log_space = 10 .^log_space_range

    # Based on the Taylor approximation of development of the cost function around point x:
    # f(R_x(t * v)) = f(x) + t * <grad f(x), v)> + O(t^2)
    lhs = [cost_fnc(geodesic(M, p, v, t)) for t in log_space]
    rhs = [f0 + t * df0 for t in log_space]
    err = abs.(rhs - lhs)

    plt = scatter(log10.(log_space), log10.(err), legend = false); Plots.abline!(plt, 1, 0); Plots.abline!(plt, 2, 0)
    return plt
end

function easy_problem(S)
    data = random_point(S)
    Y = random_point(S)

    cost_fnc(arg) = 0.5 * sum((Y - arg).^2)
    diff_fnc(arg, v) = -(Y - arg) .* v
    return check_diff(S, cost_fnc, diff_fnc)
end

easy_problem(Euclidean(5))
easy_problem(Sphere(3))
easy_problem(Stiefel(5, 2))
easy_problem(Rotations(3))

mateuszbaran commented 2 years ago

I've tried to figure it out but your have quite a weird example here. First it would be a good idea to indicate which manifold/tangent space each quantity belongs to, and if operations performed actually make sense. This: diff_fnc(p, embed(M, p, v)) indicates that diff_fnc expects a vector from the Euclidean space as the second argument (that's what embed does) but in the difficult problem: diff_fnc(arg, v) = -(Y - exp(S, p, arg)) .* differential_exp_argument(S, p, arg, v) it's used as a tangent vector on S (so embed is not a good idea). Note that M = TangentSpace(S, p) inherits representation from S, so we can just use S here. Next, I think df0 doesn't do what's in the Section 4.8 of that book. I'd change sum( something .* something) into an appropriate call to inner. There are usually the same but definitely not always. Also, subtracting two points is not a valid notion. You can use log, though. All in all, this is how I managed to (partially) clean it up:


function check_diff(M::AbstractManifold, cost_fnc, diff_fnc; p = random_point(M), v = random_tangent(M, p),
      log_space_range = range(-8,0, length = 51))

    f0 = cost_fnc(p)
    df0 = inner(M, p, v, diff_fnc(p))

    log_space = 10 .^log_space_range

    # Based on the Taylor approximation of development of the cost function around point x:
    # f(R_x(t * v)) = f(x) + t * <grad f(x), v)> + O(t^2)
    lhs = [cost_fnc(exp(M, p, v, t)) for t in log_space]
    rhs = [f0 + t * df0 for t in log_space]
    err = abs.(rhs - lhs)

    plt = scatter(log10.(log_space), log10.(err), legend = false); Plots.abline!(plt, 1, 0); Plots.abline!(plt, 2, 0)
    return plt
end

function difficult_problem(S)
    q = random_point(S)
    p = random_point(S)
    v = random_tangent(S, p)
    dir = random_tangent(S, p)

    cost_fnc(arg) = 0.5 * distance(S, q, exp(S, p, arg))^2
    diff_fnc(arg) = adjoint_differential_exp_argument(S, p, arg, -log(S, q, exp(S, p, arg)))
    return check_diff(S, cost_fnc, diff_fnc, p = dir, v = v)
end

function easy_problem(S)
    p = random_point(S)

    cost_fnc(arg) = 0.5 * distance(S, p, arg)^2
    diff_fnc(arg) = -log(S, arg, p)
    return check_diff(S, cost_fnc, diff_fnc)
end

easy_problem seems to work completely but difficult_problem does not. I'm not exactly sure how to use differentials from Manopt.jl, so @kellertuer could you help me fix that gradient?

kellertuer commented 2 years ago

I just took a five-minute look and was surprised the sum by @const-ae (in the cost function) would have a minus – I would expect distance on a manifold.

Then I am not sure why you use and adjoint differential for the differential? I would concatenate the differential of distance (hm, a log basically that is) with the differential of the exp.

With the adjoint you already go for the gradient (then diff_fnc is misleading). So maybe

    diff_fnc(arg) = differential_exp_argument(S, p, arg, -log(S, q, exp(S, p, arg)))

(if I do not mix up the order, but I think the order switches so dist/log is the inner differential and exp/its-diff is the outer one). But I do not have the time to check that right now, so this is just a first guess. Ah, no, compare (4.21) in Nicolas Boumals book, hm, sorry but I first have to prepare my lecture stuff for next week.

kellertuer commented 2 years ago

I found at least a short moment to check the easy one.

Step 1

I adapted the check_diff slightly, but that is mainly because cost/diff/grad functions have the manifold as fist argument (by convention) in Manopt

function check_diff(M::AbstractManifold, cost_fnc, diff_fnc; p = random_point(M), v = random_tangent(M, p),
      log_space_range = range(-8,0, length = 51))

    f0 = cost_fnc(M, p)
    df0 = diff_fnc(M, p, v)

    log_space = 10 .^log_space_range

    # Based on the Taylor approximation of development of the cost function around point x:
    # f(R_x(t * v)) = f(x) + t * <grad f(x), v)> + O(t^2)
    lhs = [cost_fnc(M, geodesic(M, p, v, t)) for t in log_space]
    rhs = [f0 + t * df0 for t in log_space]
    err = abs.(rhs - lhs)

    plt = scatter(log10.(log_space), log10.(err), legend = false);
    Plots.abline!(plt, 1, 0); Plots.abline!(plt, 2, 0)
    return plt
end

Step 2

The easy problem

function easy_problem(S)
    q = random_point(S)

    cost_fnc(M, p) = 1//2 * distance(S, q, p)^2
    diff_fnc(M, p, v) = inner(M, p, -log(M, p, q), v)
    return check_diff(S, cost_fnc, diff_fnc)
end

Note

the original cost just had a minus – but we have no minus on manifolds. So the correct one is a distance^2 – I renamed Y to q since it is a point on the manifold.
Now our cost is a function M -> R, so the differential Df(p)[v]$ yields a number how much the function would change walking from p in direction p.

You could check https://manoptjl.org/stable/functions/gradients.html#Manopt.grad_distance for the gradient to see that it is -log (and then the differential is just the inner product of v with the gradient).

Calling easy_problem(Rotations(3)), the check prints

That the previous version was fine is mainly because for the manifolds you checked either Minus&Norm was distance or in the embedding it was close enough.

I will check the similar ideas for the more involved one later – but I hope this already gives a good idea how to do these :)

kellertuer commented 2 years ago

For the complicated problem I think (but the check still disagrees) it should be something like

function difficult_problem(S)
    p = random_point(S)
    q = random_point(S)
    M = TangentSpace(S, p)

    cost_fnc(M, X) = 0.5 * distance(S, q, exp(S, p, X)).^2
    diff_fnc(M, X, Y) = inner(S, exp(S, p, X), -log(S, exp(S, p, X), q), differential_exp_argument(S, p, X, Y))

    X = random_tangent(S, p)
    Y = random_tangent(S, p)
    return check_diff(M, cost_fnc, diff_fnc, p = X, v = Y)
end

where I might still confuse the chain rule – will try to check with pen/paper later (as well as provide an explanation for the formula there).

edit: I found a few bugs in my code, but this still does not seem to work correctly, so here is also my derivation of why I think this is correct – and a next thing to check is the ONB maybe? Because for some values the error looks okay-ish

mateuszbaran commented 2 years ago

I've fixed the problems with diagonalizing ONB although the code is a bit ugly now:

function get_basis(M::Rotations{3}, p, B::DiagonalizingOrthonormalBasis{ℝ})
    if manifold_dimension(M) != 3
        error("get_basis is only defined for Rotations(3).")
    end
    decomp = schur(B.frame_direction)
    U = decomp.vectors
    e1, e2, e3 = [[1,0,0], [0,1,0], [0,0,1]]
    e32 = (e3 * e2' - e2 * e3')
    e21 = (e2 * e1' - e1 * e2')
    e31 = (e3 * e1' - e1 * e3')
    println("Schur: ", decomp.Schur)
    beta_e32 = (e32 .* decomp.Schur)[3,2]
    beta_e31 = (e31 .* decomp.Schur)[3,1]
    beta_e21 = (e21 .* decomp.Schur)[2,1]

    beta_e32_abs = abs(beta_e32)
    beta_e31_abs = abs(beta_e31)
    beta_e21_abs = abs(beta_e21)

    if beta_e32_abs > beta_e31_abs && beta_e32_abs > beta_e21_abs
        beta = beta_e32
        e_fd = e32
        e_1 = e21
        e_2 = e31
    elseif beta_e31_abs > beta_e21_abs
        beta = beta_e31
        e_fd = e31
        e_1 = e21
        e_2 = e32
    else
        beta = beta_e21
        e_fd = e21
        e_1 = e32
        e_2 = e31
    end

    println(beta_e32, " || ", beta_e21, " || ", beta_e31)

    println("Beta: ", beta)

    println("decomposition error: ", norm(B.frame_direction - U * beta * e_fd * U'))
    println("Vectors: ", decomp.vectors)
    println(beta)
    # eigenvectors
    Ξ = [U * e_fd * U' ./ sqrt(2),
         U * e_1 * U'./ sqrt(2),
         U * e_2 * U'./ sqrt(2)]

    println("Frame direction: ", B.frame_direction)
    println(Ξ)
    # corresponding eigenvalues
    κ = [0,beta^2 / 4,beta^2 / 4]
    return CachedBasis(B, κ, Ξ)
end

The two problems were:

Julia's Schur decomposition doesn't always use the same decomposition pattern as expected in Rentmeesters (2011). It returns one of three possible and the first big part of code I've added figures out which pattern it is.
Rentmeesters (2011) uses a differently scaled metric on SO(3) so I had to rescale vectors in Ξ.

Now the plot looks like expected:

kellertuer commented 2 years ago

Wow! Nicest team work!

edit the first check is not necessary since we dispatch on Rotations(3) already.

const-ae commented 2 years ago

Wow, that is great. Thank you both so much for figuring this out. I will test it tomorrow :)

mateuszbaran commented 2 years ago

For reference, here are some Jacobians worked out: https://vision.in.tum.de/_media/members/demmeln/nurlanov2021so3log.pdf but I'm not currently sure how they relate to differentials of log and exp on SO(3).

const-ae commented 2 years ago

Hi, quick update. I took @mateuszbaran's code and generalized it to rotations of arbitrary dimension 🥳 The code is not particularly appealing (I haven't found an elegant way to code the pattern in which the eigenvectors occur, so it's rather brute-force'ish), but works and is surprisingly efficient (median 30ms for get_basis() of 50 dimensions).

using LinearAlgebra
using Manifolds, Manopt
import Manifolds.get_basis
using Plots

function check_diff(M::AbstractManifold, cost_fnc, diff_fnc; p = random_point(M), v = random_tangent(M, p),
      log_space_range = range(-8,0, length = 51))

    f0 = cost_fnc(M, p)
    df0 = diff_fnc(M, p, v)

    log_space = 10 .^log_space_range

    # Based on the Taylor approximation of development of the cost function around point x:
    # f(R_x(t * v)) = f(x) + t * <grad f(x), v)> + O(t^2)
    lhs = [cost_fnc(M, geodesic(M, p, v, t)) for t in log_space]
    rhs = [f0 + t * df0 for t in log_space]
    err = abs.(rhs - lhs)

    plt = scatter(log10.(log_space), log10.(err), legend = false);
    Plots.abline!(plt, 1, 0); Plots.abline!(plt, 2, 0)
    return plt
end

#### NEW CODE ####

function get_tridiagonal_elements(trian)
    N = size(trian, 1)
    res = zeros(N)
    down = true
    for i in 1:N
        if i == N && down
            elem = 0
        else
            elem = trian[i + (down ? +1 : -1), i]
        end
        if elem ≈ 0
            res[i] = 0
        else
            res[i] = elem
            down = !down
        end
    end
    return res
end

function ev_diagonal(tridiagonal_elements, unitary; i)
    a,b = [unitary[:,i] for i in [i, i+1]]
    evec = [[-b a] * [a b]' ./ sqrt(2)]
    evals = [0.0]
    return (values = evals, vectors = evec)
end

function ev_offdiagonal(tridiagonal_elements, unitary; i, j)
    a,b,c,d = [unitary[:,i] for i in [i, i+1, j, j+1]]
    ref = transpose(view(unitary, :, [i,i+1, j,j+1]))
    evec = [
        [-c -d a  b] * ref ./ sqrt(4),
        [-c  d a -b] * ref ./ sqrt(4),
        [-d -c b  a] * ref ./ sqrt(4),
        [ d -c b -a] * ref ./ sqrt(4),
    ]
    evals = [(tridiagonal_elements[i] - tridiagonal_elements[j])^2 / 4,
             (tridiagonal_elements[i] + tridiagonal_elements[j])^2 / 4,
             (tridiagonal_elements[i] + tridiagonal_elements[j])^2 / 4,
             (tridiagonal_elements[i] - tridiagonal_elements[j])^2 / 4]
    return (values = evals, vectors = evec)
end

function ev_zero(tridiagonal_elements, unitary; i)
    evec = Vector{Array{Float64,2}}()
    evals = Vector{Float64}()
    ref = unitary[:, i]
    for idx in 1:size(unitary,1)
        if idx != i
            push!(evec,  (ref * unitary[:, idx]' - unitary[:, idx] * ref') ./ sqrt(2))
            push!(evals,  tridiagonal_elements[idx]^2 / 4)
        end
    end
    return (values = evals, vectors = evec)
end

function get_basis(M::Rotations{N}, p, B::DiagonalizingOrthonormalBasis{ℝ}) where N
    trian, unitary, ev = schur(B.frame_direction)
    trian_elem = get_tridiagonal_elements(trian)
    evec = Vector{Array{Float64,2}}()
    evals = Vector{Float64}()
    i = 1
    while i <= N
        if trian_elem[i] == 0
            evs = ev_zero(trian_elem, unitary, i=i)
            push!(evec, evs[:vectors]...)
            push!(evals, evs[:values]...)
            i += 1
        else
            evs = ev_diagonal(trian_elem, unitary, i=i)
            push!(evec, evs[:vectors]...)
            push!(evals, evs[:values]...)
            j = 1
            while j < i
                if trian_elem[j] == 0
                    j += 1
                else
                    evs = ev_offdiagonal(trian_elem, unitary, i=i, j=j)
                    push!(evec, evs[:vectors]...)
                    push!(evals, evs[:values]...)
                    j += 2
                end
            end
            i += 2
        end
    end
    return CachedBasis(B, evals, evec)
end

jacobi_op(A, X) = 1/4 * (-A * A * X + 2 * A * X * A - X * A * A)
skew(M) = (M - M') / 2
N = 50
A = skew(randn(N, N));
B = get_basis(Rotations(N), nothing, DiagonalizingOrthonormalBasis(A));
evals = B.data.eigenvalues
evec = B.data.vectors
# Check eigen equation
all([isapprox(jacobi_op(A, vec), lambda * vec, atol = 1e-12) for (lambda, vec) in zip(evals, evec)])
# check that all eigen vectors are orthogonal
all([isapprox(sum(ev1 .* ev2), 0, atol = 1e-10) for ev1 = evec, ev2 = evec if ev1 != ev2])
all([sum(ev .* ev) ≈ 1 for ev in evec])

function difficult_problem(S)
    p = random_point(S)
    q = random_point(S)
    M = TangentSpace(S, p)

    cost_fnc(M, X) = 0.5 * distance(S, q, exp(S, p, X)).^2
    diff_fnc(M, X, Y) = inner(S, exp(S, p, X), -log(S, exp(S, p, X), q), differential_exp_argument(S, p, X, Y))

    X = random_tangent(S, p)
    Y = random_tangent(S, p)
    return check_diff(M, cost_fnc, diff_fnc, p = X, v = Y)
end

difficult_problem(Rotations(N))

kellertuer commented 2 years ago

Nice!

I am sure Mateusz can find some ways to optimise it still – but a PR of this would be neat! For that I would just ask for more documentation, and for the basis itself even with math formulae and a reference.

mateuszbaran commented 2 years ago

That's great! I'll try optimizing it a bit during the weekend.

mateuszbaran commented 2 years ago

This is a bit faster, though maybe a little less nice. I've mostly worked on reducing the number of allocations.

function get_tridiagonal_elements(trian)
    N = size(trian, 1)
    res = zeros(N)
    down = true
    for i in 1:N
        if i == N && down
            elem = 0
        else
            elem = trian[i + (down ? +1 : -1), i]
        end
        if elem ≈ 0
            res[i] = 0
        else
            res[i] = elem
            down = !down
        end
    end
    return res
end

function ev_diagonal(tridiagonal_elements, unitary, evec, evals, fill_at; i)
    a = unitary[:,i]
    b = unitary[:,i+1]
    evec[fill_at.x] = [-b a] * [a b]' ./ sqrt(2)
    evals[fill_at.x] = 0
    fill_at.x += 1
end

function ev_offdiagonal(tridiagonal_elements, unitary, evec, evals, fill_at; i, j)
    a = unitary[:,i]
    b = unitary[:,i+1]
    c = unitary[:,j]
    d = unitary[:,j+1]
    ref = hcat(a, b, c, d)' ./ 2

    evec[fill_at.x] = [-c -d a  b] * ref
    evals[fill_at.x] = (tridiagonal_elements[i] - tridiagonal_elements[j])^2 / 4
    fill_at.x += 1
    evec[fill_at.x] = [-c  d a -b] * ref
    evals[fill_at.x] = (tridiagonal_elements[i] + tridiagonal_elements[j])^2 / 4
    fill_at.x += 1
    evec[fill_at.x] = [-d -c b  a] * ref
    evals[fill_at.x] = (tridiagonal_elements[i] + tridiagonal_elements[j])^2 / 4
    fill_at.x += 1
    evec[fill_at.x] = [ d -c b -a] * ref
    evals[fill_at.x] = (tridiagonal_elements[i] - tridiagonal_elements[j])^2 / 4
    fill_at.x += 1
end

function ev_zero(tridiagonal_elements, unitary, evec, evals, fill_at; i)
    N = size(unitary, 1)
    ref = unitary[:, i]
    for idx in 1:N
        if idx != i
            rup = ref * unitary[:, idx]'
            evec[fill_at.x] = (rup - rup') ./ sqrt(2)
            evals[fill_at.x] = tridiagonal_elements[idx]^2 / 4
            fill_at.x += 1
        end
    end
    return (values = evals, vectors = evec)
end

function get_basis(M::Rotations{N}, p, B::DiagonalizingOrthonormalBasis{ℝ}) where N
    decomp = schur(B.frame_direction)

    trian_elem = get_tridiagonal_elements(decomp.T)
    unitary = decomp.Z
    evec = Vector{typeof(B.frame_direction)}(undef, manifold_dimension(M))
    evals = Vector{eltype(B.frame_direction)}(undef, manifold_dimension(M))
    i = 1
    fill_at = Ref(1)
    while i <= N
        if trian_elem[i] == 0
            evs = ev_zero(trian_elem, unitary, evec, evals, fill_at; i=i)
            i += 1
        else
            evs = ev_diagonal(trian_elem, unitary, evec, evals, fill_at, i=i)
            j = 1
            while j < i
                if trian_elem[j] == 0
                    j += 1
                else
                    evs = ev_offdiagonal(trian_elem, unitary, evec, evals, fill_at, i=i, j=j)
                    j += 2
                end
            end
            i += 2
        end
    end
    return CachedBasis(B, evals, evec)
end

For small N the main cost is Schur decomposition (roughly 70% of remaining time) and for large N it is dominated by multiplications in ev_offdiagonal.

const-ae commented 2 years ago

Thanks @mateuszbaran for improving the code :)

@kellertuer

a PR of this would be neat! For that I would just ask for more documentation, and for the basis itself even with math formulae and a reference.

Okay, I will get on it.

const-ae commented 2 years ago

I have another question related to all this, which is related to switching between Manifolds.

To explain where I am coming from, a bit of context about the project that motivated me to learn more about differential geometry and geodesic regression: I have a number of data points Y in Euclidean space that I want to fit (i.e. minimize the Euclidean distance between Y and \hat{Y}). I assume that the datapoint are generated by some rotation where the degree of the rotation depends on some known covariate x. I now want to find the rotation R_0 that corresponds to x_i = 0 and the direction of the rotation V \in tangent space T_R_0M.

I am trying to formulate my model as a minimization of: 1/2 ||Y - \hat{Y}||_2^2 where \hat{Y}_i = exp(R_0, x_i V) * Z_i

Note that this is an extension of orthogonal Procrustes problem.

My problem now is that for the Rotation manifold, the derivatives are affected by the transition from the Rotation to the Euclidean manifolds. The following example illustrates what I mean:

function manifold_distance_problem(S)
    p = random_point(S)
    q = random_point(S)
    v1 = random_tangent(S, p)
    v2 = random_tangent(S, p)
    cost_fnc(M, Vec1) = 0.5 * distance(S, q, exp(S, p, Vec1)).^2
    diff_fnc(M, Vec1, Vec2) = inner(S, exp(S, p, Vec1), -log(S, exp(S, p, Vec1), q), differential_exp_argument(S, p, Vec1, Vec2))
    check_diff(TangentSpace(S, p), cost_fnc, diff_fnc, p = v1, v = v2)
end

manifold_distance_problem(Sphere(6))
manifold_distance_problem(Rotations(3))

function euclid_distance_problem(S)
    p = random_point(S)
    q = random_point(S)
    v1 = random_tangent(S, p)
    v2 = random_tangent(S, p)
    cost_fnc(M, Vec1) = 0.5 * sum((q - exp(S, p, Vec1)).^2)
    diff_fnc(M, Vec1, Vec2) = sum(-(q - exp(S, p, Vec1)) .* differential_exp_argument(S, p, Vec1, Vec2))
    check_diff(TangentSpace(S, p), cost_fnc, diff_fnc, p = v1, v = v2)
end

euclid_distance_problem(Sphere(6))
euclid_distance_problem(Rotations(3))

This works for all but the last call euclid_distance_problem(Rotations(3)).

I think the reason why the euclid_distance_problem() works for spheres, is that for small distances moving along on the Sphere is the same as moving in Euclidean space lim_{t->0} log(exp(p, x+tv), exp(p, x)) / t = (exp(p, x + tv) - exp(p, x)) / t.

However, for the Rotation manifold this does not hold because lim_{t->0} log(exp(p, x+tv), exp(p, x)) / t = (exp(p, x + tv)' exp(p, x) - exp(p, x)' exp(p, x + tv)) / 2 / t != (exp(p, x + tv) - exp(p, x)) / t)

My question is: What is happening during the transition when I go from one manifold to Euclidean space? Why doesn't the chain rule (as described in sec. 4.7. of Boumal's book) apply for Rotation manifolds? Or am I missing something obvious here?

mateuszbaran commented 2 years ago

I think most likely euclid_distance_problem(Rotations(3)) doesn't work because our Rotations manifold represents tangent vectors at group identity while Sphere uses the natural representation in the embedding. Anyway, if manifold_distance_problem works then it doesn't seem to be a problem :slightly_smiling_face: .

const-ae commented 2 years ago

Thanks, I was considering that explanation. However, I am unclear at what point I would need to call embed to make euclid_distance_problem(Rotations(3)) work.

Anyway, if manifold_distance_problem works then it doesn't seem to be a problem 🙂 .

Well, I wish. However for my specific project, I rather need something that looks like euclid_distance_problem 😅

const-ae commented 2 years ago

Oh... @mateuszbaran you are right. Calling embed() resolves the issue:

function euclid_distance_problem_mod(S)
    p = random_point(S)
    q = random_point(S)
    v1 = random_tangent(S, p)
    v2 = random_tangent(S, p)
    cost_fnc(M, Vec1) = 0.5 * sum((q - exp(S, p, Vec1)).^2)
    diff_fnc(M, Vec1, Vec2) = sum(-(q - exp(S, p, Vec1)) .* embed(S, exp(S, p, Vec1), differential_exp_argument(S, p, Vec1, Vec2)))
    check_diff(TangentSpace(S, p), cost_fnc, diff_fnc, p = v1, v = v2)
end

euclid_distance_problem_mod(Sphere(6))
euclid_distance_problem_mod(Rotations(12))

The problem was that I always tried embed(S, p, differential_exp_argument(S, p, Vec1, Vec2)) instead of embed(S, exp(S, p, Vec1), differential_exp_argument(S, p, Vec1, Vec2)) 🙈

mateuszbaran commented 2 years ago

OK, cool, nice that you managed to solve the problem :slightly_smiling_face: .

kellertuer commented 2 years ago

Concerning your original problem in the research you are invested in: I am not a fan of doing euclidean distances for manifold (i.e. something in the embedding), especially because if it works than only very locally and it might even be that you do not have an isometric embedding then it probably will not work in general. So if possible, maybe use distance_problem instead of the euclidean variant you posted – well at least if you want to do manifold stuff – the Euclidean problem you could also solve with any euclidean solver.

const-ae commented 2 years ago

Hey, thank you for the comment, I appreciate the input :) If I may ask, what would you suggest to do if I have a mixed problem, where data points in Euclidean space are rotated (like in the orthogonal procrustes problem) and I want to find the rotation that is in some sense optimal? Is there some alternative way to phrase this problem so that I could use the manifold_distance_problem approach?

mateuszbaran commented 2 years ago

Yes, there is an alternative approach via a rotation action: https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/group.html#Rotation-action . You probably need to define your own action based on the ones we have so there likely would be not that much work. We also have differentials of apply: https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/group.html#Manifolds.apply_diff-Tuple{AbstractGroupAction,%20Any,%20Any,%20Any} and https://juliamanifolds.github.io/ManifoldDiffEq.jl/stable/lie_group_solvers.html#ManifoldDiffEq.apply_diff_group-Tuple{AbstractGroupAction,%20Any,%20Any,%20Any}.

There is also https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/group.html#Manifolds.center_of_orbit which does some alignment but probably not the one you need.

kellertuer commented 2 years ago

Do you maybe have a reference with more mathematical details? If you ware looking for the best rotation I would first think of writing it with a cost function f(R) where R is from SO(n).

Maybe this Matlab example is what you are looking for?

https://www.manopt.org/reference/examples/generalized_procrustes.html

I can check whether one of my students already did that in Julia as well.

mateuszbaran commented 2 years ago

One point for the rotation action approach is that it allows for relatively easy generalization to rotations + translations (special Euclidean group), or some other group action that you might need.

const-ae commented 2 years ago

@mateuszbaran, the Rotation action things look great. Thank you so much for the links. I hadn't heard about this, but I will take a look :)

@kellertuer

https://www.manopt.org/reference/examples/generalized_procrustes.html

Thanks, my problem is an extension of the Procrustes problem, because I want the rotation to depend on some known covariate.

Do you maybe have a reference with more mathematical details?

Nothing properly written up, this is more something that my supervisor and I came up with during a brainstorming session. The jist of the idea is that we have $n$ measured data points $Y \in R^{g x n}$ and some known representation $Z \in R^{g x n}$. In addition for each datapoint, we know some covariate $t$. My cost function is $$ cost(p, V) = \sum_{i=0}^n || Y_i - exp(p, t_i V) Z_i||^2 $$ and I try to find $p \in SO(g)$ and $V \in T_p SO(G)$ that minimize $cost(p, V)$.

kellertuer commented 2 years ago

Ok, I am a little stuck – without more details about the real numbers (?) t_i you call covariate(s). And I have to think a little bit what the gradient here might look if you consider this on the Tangent bundle of SO(g) (i.e. all tuples (p,V) where V is in TpSO(g) for p in SO(g)).

const-ae commented 2 years ago

without more details about the real numbers (?) t_i you call covariate

The approach is inspired by regular linear regression. In my specific case, $t$ could for example be a drug concentration, or a 0/1 vector to assign observations to a treatment group.

I have to think a little bit what the gradient here might look if you consider this on the Tangent bundle of SO(g) (i.e. all tuples (p,V) where V is in TpSO(g) for p in SO(g)).

Yes, similar to the example in Do geodesic regression, I consider the tuple (p,V) of the Tangentbundle of SO(g).

I will try to provide some code soon, to illustrate what I am doing :)

kellertuer commented 2 years ago

Cool looking forward to that, it sounds really interesting.

mateuszbaran commented 2 years ago

@const-ae would you mind me adding the diagonalizing basis on SO(n) developed here to Manifolds.jl? It would be quite useful.

const-ae commented 2 years ago

No go ahead, I am happy to see it being integrated :)

mateuszbaran commented 2 years ago

Great, thanks!

kellertuer commented 2 years ago

I think this issue is then resolved?

mateuszbaran commented 2 years ago

I think it is, in any case we can reopen it or make a new one if something more needs to be done.

JuliaManifolds / Manifolds.jl

Towards differentials on SO(n) #453

Step 1

Step 2