FixedRankMatrices representation of manifold points and tangent vectors

JuliaManifolds / Manopt.jl

🏔️Manopt. jl – Optimization on Manifolds in Julia

http://manoptjl.org

Other

323 stars 40 forks source link

FixedRankMatrices representation of manifold points and tangent vectors #61

Closed qrebjock closed 3 years ago

qrebjock commented 3 years ago

Hello,

Thanks for this great Toolbox! I have been using PyManopt so far but I wanted to give a try to Julia. The API is quite different so it raised some questions for me.

I am trying to optimize over the manifold of fixed rank matrices with a simple gradient descent algorithm (and trust regions later). From what I understand in the gradient plan GradientDescentOptions the point on the manifold and the gradient need to have the same type (annotated with ::P). But for fixed rank matrices the points are represented with the class SVDMPoint and the tangent vectors with the class UMVTVector. So I end up getting errors when I try to run gradient_descent with a simple cost function and I don't really get what I am doing wrong here. I am pretty new with Julia so I might be missing something obvious.

More generally, I was wondering if you had examples with other manifolds than the Sphere, and in particular with the FixedRankMatrices manifold?

Two unrelated questions (I can open other issues if needed):

Is there some automatic differentiation available? Is it possible to simply feed the Euclidean gradient and Hessian and get them converted?
The recording tool that you have looks extremely useful. I was wondering if it's possible to record quantities like the number of evaluations of the cost / gradient / Hessian?

Thank you very much!

kellertuer commented 3 years ago

Hi, thank you for the kind words and great that you are interested. Usually the GradientOptions would “know” the type P, since the constructors are similarly typed (i.e. it would automatically be the SVDMPoint in your case.

Cab you post your current approach or a small example where FixedRank fails? It's often easier to start from an MWE. I can also look for an easy problem on FixedRankMatrizes, but if you have one...thats even easier. Also I could see which errors you are getting. I can check for an easy starting example with FixedRank later today.

Concerning your points:

First steps to AD are available in Manifolds.jl, but the conversion from Euclidean grad/hess (or more generally from an embedding) I have not yet started with.
in principle the recording can record anything you can compute from the options, but I have no ad-hoc idea ...wait. You could do the following: Write a functor for the cost, that counts itself how many times it was called. This you can access from the record options and store. Have you worked with functors before? I can surely provide a short example how to do this the easiest way.

Thanks again for your interest and kind regards from Trondheim to Lausanne.

kellertuer commented 3 years ago

Hm, a fast look around and I mostly saw gradients, where the euclidean gradient was implemented and a conversion used. As I said, this conversion is not yet available, but we could also check hot to do that together, if you like. I would just – if I approach that – use the machinery of the EmbeddedManifold because what the conversion basically does is convert from one metric (in the embedding) to another (of the manifold) and from one representation (for example the rank-r-matrix) to another (the reduced SVD). So I would like to define it for a manifold and it's embedding in the full generality.

Here's a few tipps for your error.

Make sure your Cost function is a function that maps (M,p) to a real number, where p should be a SVDMPoint in your case.
Make sure your gradient maps (M,p) to an UMVTVector

qrebjock commented 3 years ago

Hi,

Thank you very much for your answers!

Here is the code I have been trying:

using LinearAlgebra
using Manopt, Manifolds
using Random
import Manopt: random_point, random_tangent

m = 5
n = 5
k = 2

function random_point(M::FixedRankMatrices{m, n, k}) where {m, n, k}
    U = random_point(Stiefel(m, k))
    S = sort(rand(k), rev=true)
    V = random_point(Stiefel(n, k))
    return SVDMPoint(U, S, Array(V'))
end

function random_tangent(M::FixedRankMatrices{m, n, k}, p::SVDMPoint) where {m, n, k}
    Up = randn(m, k)
    Vp = randn(n, k)
    A = randn(k, k)

    return UMVTVector(Up - p.U * p.U' * Up, A, Vp' - Vp' * p.Vt' * p.Vt)
end

F(M, p) = norm(p.U * diagm(p.S) * p.Vt - A)^2 / 2

# Riemannian gradient
function gradF(M, p::SVDMPoint)
    dS = diagm(p.S)
    return UMVTVector(
        -(I - p.U * p.U') * A * p.Vt',
        dS - p.U' * A * p.Vt',
        -p.U' * A * (I - p.Vt' * p.Vt)
    )
end

A = randn(m, n);
M = FixedRankMatrices(m, n, k)
X0 = random_point(M)

X = gradient_descent(M, F, gradF, X0)

It's simply trying to find the best rank-k approximation of some matrix. The error it raises is

MethodError: Cannot `convert` an object of type UMVTVector{Array{Float64,2},Array{Float64,2},Array{Float64,2}} to an object of type SVDMPoint{Array{Float64,2},Array{Float64,1},Array{Float64,2}}

From what I understand it's happening because in the struct GradientDescentOptions the point on the manifold and the gradient are supposed to have the same type ::P. But they don't because one is represented with SVDMPoint and the other with UMVTVector.

Thanks for your remarks regarding AD and recordings! This will be helpful.

Regarding the conversion of the gradient in Manopt.m and PyManopt they have these egrad2rgrad functions that project the Euclidean gradient on the tangent space. I guess the same thing could be done here? I would actually be interested to look into it but I guess that's secondary at this point.

Kind regards, Quentin

kellertuer commented 3 years ago

Thank you for the example and coming back to your initial comment, no, you are not missing something. You maybe overtyped a little (types that are “clear from context” can be left out, they are inferred automatically).

And yes this is a bug. At first glance I might have to change the GradientOptions constructor signature for that (since I need a Manifold to infer the type of a tangent vector). I will try to fix it soon, thanks for noticing. And sure, that happened due actually all other manifold up to now working directly on matrix representations where then both have the same type.

kellertuer commented 3 years ago

I just linked a PR that should fix your problem. With just two further remarks

1) The way you called gradient descent, it would use the exponential map, which is not available for fixed Rank matrices. actually that is one point where Manopt.jl or more precisely Maifolds.jl differs from Manopt and pymanopt: handling of retractions and inverse retractions (similarly for vector transports). Retractions always have an AbstractRetractionMethod assigned, where a default (ExponentialRetraction()) just falls back to calling exp. But you code with the last line written as

X = gradient_descent(M, F, gradF, X0; retraction_method=PolarRetraction())

works just fine.

2) You implemented random_point and random_tangent for fixed rank matrices. Would you like to contribute them to the code base? Surely the random functions here are merely a start and there is a longer project to provide distributions on manifolds (see JuliaManifolds/Manifolds.jl#57), but until that is available I will provide the poor-mans-variant random_point and random_tangent for certain types here. I would be happy if you provide your randoms as a PR, try to adopt the documentation from the others, write a test for each. And if you are stuck somewhere let*s discuss that in a PR.

Let me know whether the PR JuliaManifolds/Manifolds.jl#62 fixes your problem.

mateuszbaran commented 3 years ago

Manifolds.jl does actually have some basic support for Riemannian gradients: https://juliamanifolds.github.io/Manifolds.jl/stable/features/differentiation.html#Manifolds.gradient-Tuple{Manifold,Any,Any,AbstractRiemannianDiffBackend} . It's not quite polished yet though nor integrated with Manopt.jl, and currently Riemannian Hessians are still only planned: https://github.com/JuliaManifolds/ManifoldDiff.jl/issues/29 .

kellertuer commented 3 years ago

Here*s a small example on how to write your own Recording Action, just not very spectacular yet in the results. https://manoptjl.org/previews/PR63/tutorials/AdvancedRecord.html

qrebjock commented 3 years ago

Hi,

Sorry for the late answer, I was making sure that I understand everything before replying.

Thanks for the PR, it's working now! (with the PolarRetraction object)

Thanks also for the links for AD and recordings!

I will open a pull request to add the two functions that you mentioned. But before doing this, I want to clarify something. The UMVTVector class represents a tangent vector in compact form. I am confused by the documentation here https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/fixedrankmatrices.html because it is said that the tangent vector is U M V^T. I think that a UMVTVector(U_p, M, V_p) object should actually represent the vector U M V^T + U_p V^T + U V_p^T where X = U S V^T is the current point. Is that right?

If you confirm this then I think there is a problem in the function project!(M::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, X::UMVTVector). I think there might also be an issue with the retraction retract!(::FixedRankMatrices{M,N,k}, q::SVDMPoint, p::SVDMPoint, X::UMVTVector, ::PolarRetraction). I am not totally sure about it yet, I am still looking into it.

I can make the required changes but I would need you to confirm that I understand the representation of the tangent vectors correctly.

Once all of this is set up I will be able to try trust regions on FixedRankMatrices.

kellertuer commented 3 years ago

Thanks for the PR, it's working now! (with the PolarRetraction object)

Great!

I will open a pull request to add the two functions that you mentioned.

Nice, looking forward to seeing that.

But before doing this, I want to clarify something. The UMVTVector class represents a tangent vector in compact form. I am confused by the documentation here https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/fixedrankmatrices.html because it is said that the tangent vector is U M V^T. I think that a UMVTVector(U_p, M, V_p) object should actually represent the vector U M V^T + U_p V^T + U V_p^T where X = U S V^T is the current point. Is that right?

Yes, that's right, my mistake when I wrote the documentation – feel free to also include that in the PR, my UMVTVector documentation is wrong there and the one written in the doctoring of FixedRankMatrices at https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/fixedrankmatrices.html#Manifolds.FixedRankMatrices is the right one

If you confirm this then I think there is a problem in the function project!(M::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, X::UMVTVector).

Can you provide more detail or include the fix? Because I think I implemented that based on Bart's paper (see reference on that page in the bottom).

I think there might also be an issue with the retraction retract!(::FixedRankMatrices{M,N,k}, q::SVDMPoint, p::SVDMPoint, X::UMVTVector, ::PolarRetraction). I am not totally sure about it yet, I am still looking into it.

Same comment as for the project – I am not saying the current one is right – especially since I never used it, I just had the time and transferred the formulae (and it runs in test cases correctly, which for sure does not make it correct).

I can make the required changes but I would need you to confirm that I understand the representation of the tangent vectors correctly.

I confirm your understanding is correct, see again the docs of FixedRankMatrices where the tangent vector form/representation is stated correctly.

qrebjock commented 3 years ago

Thanks for the clarifications!

Regarding project!, I think that project!(::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, A::AbstractMatrix) is correct. But the function project!(M::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, X::UMVTVector) reconstructs the tangent vector with the formula U M V^T and independently of the point to which it is attached. So I think this function should take one argument p which is the point on whose tangent space we want to project, and another one q which is the point to which the tangent vector that we want to project is attached. Do it make sense? It would have the signature project!(M::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, q::SVDMPoint, X::UMVTVector) in the end.
Here is why I think the retraction might not work. I checked the gradient and Hessian of a simple function with the technique described in sections 4.8 and 6.7 of Nicolas Boumal's book http://sma.epfl.ch/~nboumal/book/IntroOptimManifolds_Boumal_2020.pdf. These tests fail. At first I thought that the gradient and Hessian were wrong. But then I implemented the retraction described in section 7.5, equation 7.48. And with that one the tests pass. I can provide code if needed. Do you have a reference where the polar retraction for fixed-rank matrices is specified? So I can compare.

Here is the code I used for these tests:

using LinearAlgebra
using Manopt, Manifolds
using Random
import Manopt: random_point, random_tangent
using Plots 

m = 10
n = 10
k = 5

function random_point(M::FixedRankMatrices{m, n, k}) where {m, n, k}
    U = random_point(Stiefel(m, k))
    S = sort(rand(k), rev=true)
    V = random_point(Stiefel(n, k))
    return SVDMPoint(U, S, Array(V'))
end

function random_tangent(M::FixedRankMatrices{m, n, k}, p::SVDMPoint) where {m, n, k}
    Up = randn(m, k)
    Vp = randn(n, k)
    A = randn(k, k)

    return UMVTVector(Up - p.U * p.U' * Up, A, Vp' - Vp' * p.Vt' * p.Vt)
end

function to_ambient(X::SVDMPoint, T::UMVTVector)
    return X.U * T.M * X.Vt + T.U * X.Vt + X.U * T.Vt
end

function to_ambient(X::SVDMPoint)
    return X.U * diagm(X.S) * X.Vt
end

F(M, p) = norm(p.U * diagm(p.S) * p.Vt - A)^2 / 2

function gradF(M, p::SVDMPoint)
    dS = diagm(p.S)
    return UMVTVector(
        -(I - p.U * p.U') * A * p.Vt',
        dS - p.U' * A * p.Vt',
        -p.U' * A * (I - p.Vt' * p.Vt)
    )
end

function retr(M, X, W)
    QU, RU = qr([X.U W.U])
    QV, RV = qr([X.Vt' W.Vt'])
    T = svd(RU * [diagm(X.S) + W.M I; I zeros(k, k)] * RV')
    return SVDMPoint(QU * T.U[:, 1:k], T.S[1:k], T.Vt[1:k, :] * QV')
end

function hessF(M, X, H)
    dSinv = diagm(1 ./ X.S)
    return UMVTVector(
        (I - X.U * X.U') * (X.U * H.M + H.U - A * H.Vt' * dSinv),
        H.M,
        (H.M * X.Vt + H.Vt - dSinv * H.U' * A) * (I - X.Vt' * X.Vt)
    )
end

A = randn(m, 50) * randn(50, n)
M = FixedRankMatrices(m, n, k)
X = random_point(M)
W = random_tangent(M, X)
G = gradF(M, X)
H = hessF(M, X, W)

ts = 10 .^ (range(-8, 0, length=100))
r = []

for t = ts
    append!(
        r,
        abs(
            F(M, retr(M, X, t * W))
            #F(M, retract(M, X, t * W, PolarRetraction()))
            - F(M, X)
            - t * dot(to_ambient(X, W), to_ambient(X, G))
            - t * t / 2 * dot(to_ambient(X, W), to_ambient(X, H))
        )
    )
end

plot(ts, r, xaxis=:log, yaxis=:log)
plot!(ts, ts .^ 3, xaxis=:log, yaxis=:log)

To compare to the polar retraction you can just comment F(M, retr(M, X, t * W)) and uncomment F(M, retract(M, X, t * W, PolarRetraction())).

kellertuer commented 3 years ago

[...] But the function project!(M::FixedRankMatrices, Y::UMVTVector, p::SVDMPoint, X::UMVTVector) reconstructs the tangent vector with the formula U M V^T and independently of the point to which it is attached.

That would not fit the current generic signature of project!. But yes, the current for is wrong and I confused myself with documenting UMVTVector actually. The only solution I think, is to remove this variant.

The variant with another point q for X could be given as a vector_transport_to using the ProjectionTransport type. That would be rigorous and correct here. It would (roughly descripbed) und q and X to reconstruct the full vector and then call the first project variant with that matrix.

Here is why I think the retraction might not work. I checked the gradient and Hessian of a simple function with the technique described in sections 4.8 and 6.7 of Nicolas Boumal's book http://sma.epfl.ch/~nboumal/book/IntroOptimManifolds_Boumal_2020.pdf. These tests fail. At first I thought that the gradient and Hessian were wrong. But then I implemented the retraction described in section 7.5, equation 7.48. And with that one the tests pass. I can provide code if needed. Do you have a reference where the polar retraction for fixed-rank matrices is specified? So I can compare.

I had hoped I implemented Algorithm 6 by Vandereycken, but it does not look like that at first glance. I will check Nicolas book for the tests, providing them here too, might be a good idea. So the goal is to have Algorithm 6 from Vandereycken for this retraction. If you have the time, it would be great if you could fix that, too :) Especially

function retr(M, X, W) QU, RU = qr([X.U W.U]) QV, RV = qr([X.Vt' W.Vt']) T = svd(RU [diagm(X.S) + W.M I; I zeros(k, k)] RV') return SVDMPoint(QU T.U[:, 1:k], T.S[1:k], T.Vt[1:k, :] QV') end

Is this what you propose as the projection retraction? Documenting and adding this would be great, too.

I can surely help with a PR, just an own PR by myself is only possible in a few weeks, since I started on a new position (in a new country) beginning of March. So – great that you work so much with Julia, I am happy to see this. If you have the tie to fix/contribute – even better! Thanks for that.

(sorry I did not mean to close this issue)

kellertuer commented 3 years ago

Regarding the conversion of the gradient in Manopt.m and PyManopt they have these egrad2rgrad functions that project the Euclidean gradient on the tangent space. I guess the same thing could be done here? I would actually be interested to look into it but I guess that's secondary at this point.

just an addendum to this, the have project (and project!) available oven on Manifolds.jl for a lot of (embedded) manifolds, where the metric if from the embedding (eg the sphere). So with that you can easily define a gradient by projection, though we do not have an egrad2rgrad function. I would also prefer to stick to project name wise, also wen doing the project for fixed rank for example, where the representation changes from the embedding of the matrices to the 3-factors..

kellertuer commented 3 years ago

This is further discussion and solved in JuliaManifolds/Manifolds.jl#340.