LinearMaps on non-AbstractVectors

antoine-levitt commented 5 years ago

It is often useful to define linear operators on arbitrary linear spaces, not necessarily AbstractVectors (e.g. the discretization of a n-d PDE). At the moment this is not supported by LinearMaps.

dkarrasch commented 5 years ago

What type of linear spaces do you have in mind? And which changes would be required in LinearMaps.jl? It could well be that what you're looking for is contained in @Jutho's KrylovKit.jl.

antoine-levitt commented 5 years ago

My application is with a 2d array, but in general it could be a general abstractarray - or indeed a more general type of objects, eg an ApproxFun or something. KrylovKit indeed looks interesting, I wasn't aware of it. Isn't it a duplication of effort to have solvers both in KrylovKit and IterativeSolvers?

Jutho commented 5 years ago

Isn't it a duplication of effort to have solvers

Probably, but it always seemed like my need for more general vectors than <:AbstractVector or even <:AbstractArray was not shared by anybody in the community, or by any packages in the ecosystem. Julia is so great in powerful abstractions, but still many packages write code with the same assumptions or the same kind of interface, as if they would be writing in C or other languages.

I am also working on OptimKit.jl :-).

antoine-levitt commented 5 years ago

Well, certainly how you devote your time and effort is none of my (or anyone's) business, and it is certainly more fun to write code than to get into some code's frame of mind and fix somebody else's bugs. I just want to point out that it is annoying to users to have competing implementations of the same algorithm, each with different features, levels of maintenance, etc. I can't speak for IterativeSolvers, but over at https://github.com/JuliaNLSolvers/Optim.jl/ we certainly care about other input types than just Vectors (see eg https://github.com/JuliaNLSolvers/Optim.jl/blob/master/test/multivariate/array.jl). I'm not sure it has been tested with non-AbstractArrays but I'm sure @pkofod would welcome a PR if it doesn't work.

I see that you support Riemannian optimization, great to see that! I added support for that to Optim. You seem to have been more careful about it than I did: I implemented the poor man's version where you just pepper the unconstrained code with retractions and projections on the tangent space. Yet that was enough to get to performance parity with other implementations for my use cases (LBFGS on the Stiefel manifold). You might also be interested in linking to https://github.com/JuliaNLSolvers/ManifoldProjections.jl/ (and if that is not enough for your use case I would be interested in why).

Jutho commented 5 years ago

The problem is, once you go away even from AbstractArray, you cannot assume anything anymore about how to take steps, i.e. add vectors, and so forth. In KrylovKit.jl, I still follow the typical approach of object oriented languages; any object can be used as a vector, as long as it satisfies a certain interface already provided by Julia Base (and LinearAlgebra) (axpy!, dot, ...). I think that is reasonable, as all the objects in Krylov methods are vectors.

With optimization, that's a different kind of beast. The points over which you try to optimize can take values in an arbitrary set, say, smooth manifold, for the purpose of gradient optimization. The directions on this manifold are tangent vectors, so they are still like vectors, but the points on the manifold are not necessarily so. As such, I don't expect any interface and just enable the user to specify which method is used for take steps (retractions),to compute inner products between directions, to parallel transport directions, and so forth.

Did I really need this? Yes. I am working with something known as tensor networks, where, in the simplest case, my manifold is specified by a single tensor. However, that representation is actually like a fiber bundle, i.e. there is some redundancy in the parameterisation which cannot easily be taken out. Furthermore, several quantities, like the function value and gradient, but even the inner product between tangent vectors, rely on auxiliary quantities that are implicitly determined by the tensor specifying the point on the manifold, so I would like to only compute them once and store them together with the tensor. There are different ways to update the tensor, and so forth.

With OptimKit, nothing needs to be known about how the user want to encode his points x, it can be an arbitrary user type, a tuple of different objects, and so forth. Also, x could be a tuple that already contains the function value and gradient along with it, if they happen to be computed at the same time as when the new point x was generated. There is complete freedom.

antoine-levitt commented 5 years ago

So what you're saying is that you have a manifold that you'd rather keep entirely intrinsic, instead of seeing as embedded in a bigger linear space. This makes sense, but in the end you have to have a way to represent the tangent space as a vector, and move towards that direction. Is there really no way you can see that as retract(x+d), maybe by overloading +? I understand this is less pretty than your current approach, but it does make it possible to use other libraries.

Jutho commented 5 years ago

Not really though, + seems to imply commutativity and other properties which retract does not have. Also, especially while experimenting, I don't want to define new types for all kinds of different parameterisations which I can easily represent as a tuple of a number of standard types.

To given one example, one particular manifold is parameterized via an array A of size (D,d,D). However, it can also be made to satisfy a constraint, for example conj(A[a,s,b])*A[a,s,b'] = I[b,b'], meaning that, U=reshape(AL, (D*d,D)) satisfies U'*U = I, so it is like a Stiefel manifold. However, the metric is still different. And also another choice can be made, namely such that V=reshape(AR,(D,d*D)) satisfies V*V' = I. These two choices are related by a matrix C such that AL[:,s,:]*C = C * AR[:,s,:]. Sometimes it is beneficial to have access to all three variables AL, C and AR, though AL or AR in itself completely determine the other two (via a non-trivial eigenvalue problem that you don't want to solve more than once).

Having AL, a tangent vector BL can be chosen such that reshape(BL,(D*d,D))' * reshape(AL,(D*d,D)) = 0. This implies that AL + epsilon BL still satisfies the required condition, up to order epsilon^2. But then, it is actually better to use a retraction which ensures that this condition is exactly satisfied, e.g. define the antisymmetric matrix K = reshape(BL,(D*d,D)) * reshape(AL,(D*d,D))' - reshape(AL,(D*d,D)) * reshape(BL,(D*d,D))' and apply the Caley map(I + alpha/2 K)/(I - alpha/2 K) reshape(AL,(Dd,D))or the exponentialexp(alpha K) reshape(AL,(Dd,D))`.

To experiment with all these different choices, it is much easier to just redefine the retraction function then to define new types, overload + in a very unnatural way, and so forth.

The manifold is in fact embedded in a larger vector space, it just happens to be the case that that vector space has a dimension which scales as d^L, where I am working in the limit L = +Inf. So also my tangent vector B is just a parameterisation of a larger vector in that infinite dimensional hilbert space.

antoine-levitt commented 5 years ago

That's fair enough. I guess Optim could change every retract(x+d) to retract(x,d) then.

But then, it is actually better to use a retraction which ensures that this condition is exactly satisfied, e.g. define the antisymmetric matrix K = reshape(BL,(Dd,D)) reshape(AL,(Dd,D))' - reshape(AL,(Dd,D)) reshape(BL,(Dd,D))' and apply the Caley map (I + alpha/2 K)/(I - alpha/2 K) reshape(AL,(Dd,D))or the exponentialexp(alpha K) reshape(AL,(Dd,D))`.

FWIW, I experimented quite a bit with this on the Stiefel manifold, and every retraction pretty much gave the same result, so I ended up with simply a Cholesky-based orthogonalization of x+d, which was the fastest.

pkofod commented 5 years ago

That's fair enough. I guess Optim could change every retract(x+d) to retract(x,d) then.

A more general interface is fine by me. It probably wouldn't buy us anything in the current setup, but it wouldn't hurt either.

Jutho commented 5 years ago

Well, retract is only part of the story. Also my directions, i.e. tangent vectors, can be of arbitrary type, and so I also define a custom inner product, a custom scalar multiplication and addition, and a custom vector transport. Feel free to make any changes you like, but my package (which is WIP, not extensively tested and certainly not registered) grew organically from my needs and for now I'll just stick to it.

pkofod commented 5 years ago

I think you may have misunderstood my intentions :) I’m not trying to “win you over” (though I’d welcome colaboration). It’s just that we are going to extract the Manifold Projections out of Optim into a separate package so we can also use it elsewhere. That’s why I’m interested in your API: to see if there’s something to learn.

dkarrasch commented 5 years ago

Regarding the original issue, does it make sense to work on it within LinearMaps.jl? Dropping the AbstractVector requirement in the _mul_s yields an ambiguity in our current framework: We interpret L::LinearMap * A::AbstractMatrix as a CompositeMap. But in the spirit of this thread, this could mean "apply L to the vector A". We also have

mul!(Y::AbstractMatrix, A::LinearMap{T}, X::AbstractMatrix)

being interpreted as "apply A columnwise to X". And in fact, this interpretation was requested by IterativeSolvers for applying A to subspaces spanned by the columns of X, consistently with the usual A*X. In summary, I'm not sure if there is an easy adoption of LinearMaps.jl to non-AbstractVectors.

antoine-levitt commented 5 years ago

Hm, this is tricky. LinearMap could parametrically depend on the type (or shape) of its inputs, so that it only dispatches to the block interpretation if it makes sense? There is ambiguity elsewhere in the ecosystem: A*X if X is a matrix is sometimes interpreted as blockwise application of A (when A is a matrix) or as A acting on the full X (when A is an FFT plan for instance).

Jutho commented 5 years ago

@pkofod , that is indeed a great idea.

The difference is indeed that the manifolds I am facing are typically not embedded in a vector space that you want to deal with directly, as they are exponentially large or truly infinite dimensional. Hence, the last thing I want to do, or can do, is describe my state as a full vector in that space.

Hence, I have an intrinsic parameterization, as remarked by @antoine-levitt . But furthermore, this parameterization is not a a set of coordinates, as there is some redundancy in the description, and it is not easy to get rid of this, especially nog globally or explicitly. The correct mathematical picture is that my parameter space corresponds to the total space of a fiber bundle, whereas my physical manifold is the base space of that fiber bundle, and is itself embedded in an infinite dimensional vector space.

Jutho commented 5 years ago

Regarding the original issue; I think there is much of the rest of the ecosystem that needs to be modified for this to work, and it also doesn't play well with the current interface. What is size of a linear map if it is not acting on vectors. If you go down this route, I think KrylovKit.jl is what you end up with, and at that point, I think just accepting arbitrary functions as linear maps is what you want.

I always found the LinearMaps solution to compute eigs for arbitrary functions that implement the action of the linear map suboptimal to Matlab, where eigs just accepts a custom function. But then, I was also annoyed by other aspects of ARPACK and the eigs interface, among which the fact that the vectors need to be really <:AbstractVector, or even Vector.

I certainly hope that the ecosystem transitions to such a much more general approach, as I am not a numerical analyst and writing Krylov methods or optimization algorithms is not my area of expertise, but I was at the point where it was more work to fit my needs into the interface of existing packages then to just write my own package.

antoine-levitt commented 5 years ago

I think just accepting arbitrary functions as linear maps is what you want.

I don't have a good opinion on that. On the one hand, functions is certainly the most flexible and simple interface. Indeed, I only came to LinearMaps because I wanted to wrap my function into a form IterativeSolvers would like. On the other, there are a number of variants (e.g. in place or not, is the operator hermitian, is the objective function differentiable, etc.) that are convenient to abstract away into a type (LinearMaps, or the types in Optim.jl for objective functions). A related question is how to represent the vectors you need, eg how do you store a history in GMRES or an approximate jacobian in broyden methods, without assuming they are Vector, or vecing them.

Jutho commented 5 years ago

an approximate jacobian in broyden methods, without assuming they are Vector, or vecing them.

By Jacobian do you mean Hessian (which is indeed the Jacobian of the gradient). For large-scale optimization problems you anyway don't want to store them as a matrix, no? In limited memory implementations, you just need a list of previous gradients, so any list (Vector) of custom objects is ok, no? The same for a history in GMRES (not sure if I know what a history in GMRES is and what the point of keeping it is)?

antoine-levitt commented 5 years ago

Hessian when you want to solve an optimization problem, jacobian when you want to solve F(x) = 0. For BFGS you store as a matrix, but I agree there's no real point in BFGS when you can do LBFGS. I'm more used to storing the history as a N x m matrix (if N is the dimension of the problem and m the history size), because that leads to more locality in memory and more efficient linear algebra, but I remember discussions to the effect that this didn't really matter so I guess a list of objects is fine too. In GMRES you just store the reduced matrix so it's less of a problem.

Jutho commented 5 years ago

It is true that you miss some opportunity for squeezing out maximal efficiency in Krylov type algorithms, by storing e.g. the Krylov subspace as a list of vectors, instead of as a matrix. In particular, it is about the fact that you cannot use BLAS level 2 in the orthogonalization step, whereas you could if you have the basis available as matrix (that is, if you are ok with using non-modified gram schmidt). In KrylovKit.jl, I currently have some custom pure-Julia (multithreaded) methods that try to mimick BLAS level 2 when the vectors are <:DenseArray, I assume the better solution is indeed to store these as a matrix and employ BLAS directly. That is on the to-do list. Other than that, I don't see many disadvantages. Block Krylov methods might be another; I don't know much about those.

JeffFessler commented 4 years ago

Related to the original question, v0.6 of https://github.com/JeffFessler/LinearMapsAA.jl now provides a type (LinearMapAO, where O is for Operator) that supports linear mappings between AbstractArrays of arbitrary dimensions, quite similar to that of https://github.com/hakkelt/FunctionOperators.jl. If A::LinearMapAO has "input dimension" (2,3) and "output dimension" (4,5,6) and you do A*X where X is an AbstractArray of dimension (2,3,7,8), then the output will be an Array of dimension (4,5,6,7,8). In other words, it works block-wise, perhaps as alluded by @antoine-levitt above. If you really want a new LinearMapAO, rather than an Array, then you must first wrap X in a LinearMapAO. This is a deliberate departure from the non-Matrix like behavior in LinearMaps where A*X produces a new LinearMap, as discussed by @dkarrasch above.

JuliaLinearAlgebra / LinearMaps.jl

LinearMaps on non-AbstractVectors #44