Laziness of *(LinearMap, Matrix)

antoine-levitt commented 3 years ago

From the docs of Base.:*

Return the CompositeMap ALinearMap(X), interpreting the matrix X as a linear operator, rather than a collection of column vectors. To compute the action of A on each column of X, call Matrix(AX) or use the in-place multiplication mul!(Y, A, X[, α, β]) with an appropriately sized, preallocated matrix Y.

This is annoying when eg implementing block iterative solvers. Is there any reason this does that instead of just applying A on each column?

dkarrasch commented 3 years ago

This is very old default behavior of LinearMaps.jl, which decided to be on the "safe, lazy" side of things. There are two options:

Wrap the call by convert(Matrix, A*X), which won't introduce any overhead if both A and X happen to be matrices anyway.
Use 3-arg mul!(Y, A, X).

Due to the existence of mul! and its eager columnwise action I think there is some good reason for the lazy behavior as a default, because otherwise one would force people who want the lazy product to write A*LinearMap(X).

antoine-levitt commented 3 years ago

Right but this breaks downstream users who just expect LinearMaps to behave as essentially sparse matrices (which is a very reasonable thing to do). Eg I can't write

function block_richardson(A,X, B)
while iteration
    Xnext = X - alpha * (A * X - B)
end

It's also confusing to me to have be lazy but mul! be eager. If you want laziness for AX you probably want laziness for X also, so it's reasonable to ask the user to do LinearMap(X). The rule op(lazy, lazy) => lazy and op(lazy, eager) => eager seems reasonable to me, no?

dkarrasch commented 3 years ago

There is no right or wrong here, just two points of view, and one which has been the default perhaps since the creation of the package. What's wrong with

function block_richardson(A,X, B)
    while iteration
        Xnext = X - alpha * (convert(AbstractMatrix, A * X) - B)
    end
end

? I mean, of course, one can change the *(::LinearMap, ::Matrix) behavior, but it's breaking.

Jutho commented 3 years ago

It would anyway be better to write something like

Xnext .= X .- alpha .* ( some_verion_of(A*X) .- B)

In the original version, after having created AX = A*X as a matrix, you will create another copy to store AX - B, yet another copy to store alpha * (AX - B), and then the final copy to create Xnext = X - alpha * (AX - B). That's a whole lot of memory allocations that you could save on.

I guess we do currently not define the behaviour of LinearMaps under broadcasting, right? We could easily define broadcastable(A::LinearMap) = convert(AbstractMatrix, A).

antoine-levitt commented 3 years ago

It's forcing LinearMap's idiosyncrasies onto solvers (which may not be designed with LinearMaps in mind), so it's kind of nullifying the point of LinearMap, which is (to me) to write a solver as if A was a matrix.

@Jutho the above snippet was of course a simplification. I use A*X here https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl#L259 to create storage space. Of course I could preallocate then use mul!, but A*X makes perfect sense here.

Jutho commented 3 years ago

as if A was a matrix

I don't think this was ever the premise of LinearMaps.jl. In particular, Matrix or AbstractMatrix in Julia means an efficient getindex. That's why explicitly we don't have LinearMap <: AbstractMatrix.

antoine-levitt commented 3 years ago

"as if" is the key point here: it's essentially duck typing. LinearMaps quacks like a duck for all the operations it defines, except I believe for this one.

Jutho commented 3 years ago

I use AX here https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/eigen/lobpcg_hyper_impl.jl#L259 to create storage space. Of course I could preallocate then use mul!, but AX makes perfect sense here.

So it's really just this one line, cause in the loop you are also using mul!(AR, A, R) where you've allocated AR once with AR = zero(X). I don't see the big deal of first defining AX = similar(X) or AX = zero(X) and then doing mul!(AX, A, X). Personally, I like it if I see all the allocations together and explicit using calls like similar or zero, and then use a single operation (in this case mul!) rather than requesting custom types to support multiple methods (e.g. both mul! and *) to do essentially the same thing. But "personally" is certainly the key point here :-).

Jutho commented 3 years ago

LinearMaps quacks like a duck for all the operations it defines, except I believe for this one.

I don't quite agree; this is the first line of the README: "Linear maps are also known as linear transformations or linear operators acting on vectors. The only requirement for a LinearMap is that it can act on a vector (by multiplication) efficiently."

If you define *(::LinearMap, ::AbstractMatrix) -> ::AbstractMatrix, then what do you do with *(::LinearMap, ::SparseMatrix). Should it be Matrix? Or a SparseMatrix? But it might not be sparse, the linear map could be a highly efficient function that does not have a sparse representation?

Or do you want *(::LinearMap, ::SparseMatrix) -> ::LinearMap and *(::LinearMap, ::Matrix) -> ::Matrix. Now that would be confusing I think. And what with all the other matrix types. I think with mul! we have a nice Julian way to get the operation that you want, without enforcing any LinearMap specifics, and for * the current behaviour is the one that works best in the most generic case.

antoine-levitt commented 3 years ago

If you define (::LinearMap, ::AbstractMatrix) -> ::AbstractMatrix, then what do you do with (::LinearMap, ::SparseMatrix).

I'd just define it the way it's defined now with AbstractVector. If for some reason you do *(::LinearMap, ::SparseMatrix) the result will be terrible, but if you want laziness you just have to wrap B in a LinearMap. The rule op(lazy, eager) = eager, op(lazy, lazy) = lazy seems reasonable to me. It's just weird to me to have this very special case be lazy, while both mul! and matvecs are eager.

I don't see the big deal of first defining AX = similar(X) or AX = zero(X) and then doing mul!(AX, A, X).

A*X is more idiomatic than AX = similar(X); mul!(AX, A, X), and simpler to read. It's not that big a deal (and we've just switched to that in the code), but still, would be nice.

Personally, I like it if I see all the allocations together and explicit using calls like similar or zero, and then use a single operation (in this case mul!) rather than requesting custom types to support multiple methods (e.g. both mul! and *) to do essentially the same thing. But "personally" is certainly the key point here :-).

I tend to think of packages like LinearMaps as "implement all the methods you can using this interface", so that I don't have to think about it. Agree that it comes down to preferences and intended use for LinearMaps.

Jutho commented 3 years ago

Indeed, I also agree it's very much a matter of opinion. Hence, one final opinion from my side :-).

The point of LinearMaps is to represent linear maps, and make their action on a vector as efficient as possible. The current interface offers the promise that it won't create or thus allocate new matrices for whatever kind of combinations you make with your linear maps and other linear maps (represented as dense matrices, sparse matrices, other linear maps, ...); only their action on a vector will of course create a new vector. If you were to instantiate *(::LinearMap, ::Matrix) as a matrix, then you should also do that with +(::LinearMap, ::Matrix) (which one could definitely argue for), but then what do you do with +(::LinearMap, ::UniformScaling)? I think it's highly unlikely that you want to instantiate that to a matrix, and it would be very unfortunate if you have to write A + LinearMap(I).

In my (very limited) mind set, block-Krylov methods are maybe the only place where you do actually want LinearMap * Matrix = Matrix (though that is probably my ignorance), and therefore it is not unexpected that there you need a specific step to say: now apply this linear map to this matrix, which is actually not really a linear map, but just a collection of vectors. Using mul! for that could not really be described as forcing LinearMap's idiosyncrasies onto the code I would say.

antoine-levitt commented 3 years ago

If you were to instantiate *(::LinearMap, ::Matrix) as a matrix, then you should also do that with +(::LinearMap, ::Matrix) (which one could definitely argue for), but then what do you do with +(::LinearMap, ::UniformScaling)?

I didn't even see that +(::LinearMap, ::Matrix) was defined. Personally I would just leave all of these undefined, and only define lazy operations between LinearMaps. LinearMap are nontrivial data structures with large performance implications, and so I would be as explicit as possible. But I don't really use FunctionMap composition a lot so that might be my biases showing.

In my (very limited) mind set, block-Krylov methods are maybe the only place where you do actually want LinearMap * Matrix = Matrix (though that is probably my ignorance),

I think the difference is that you see matrices as operator by default, and I see matrices as bunches of vectors by default. There are lots of contexts where this makes sense - in my field of electronic structure, there are a bunch of electrons that each have their orbitals, which are naturally stored as one matrix, with each column being an electron.

Jutho commented 3 years ago

I agree a matrix can be much more, but I think that my perspective is that in the context of LinearMaps.jl, a Matrix is only one way to represent a LinearMap, hence, combining a Matrix with a LinearMap, whether it is via multiplication, addition, or, through all the great work of @dkarrasch (concatenation, kronecker products, ...), this should not force the LinearMap to take a Matrix form. To clarify, I don't think of LinearMap as a special kind of Matrix, I think of Matrix, when it is combined with LinearMap using any of those operations, as a special kind of LinearMap. Hence, making *(::LinearMap, ::AbstractMatrix) return an AbstractMatrix, one should also consider what all these other operations need to do, and this would be very much breaking. I think the current behaviour is very consistent in that respect.

As for a list of vectors, because I typically deal with vectors which are not Vectors --like I support with KrylovKit.jl-- I would typically represent it as a Vector of vectors 😃 .

antoine-levitt commented 3 years ago

OK, it's not the design I would have hoped for, but I appreciate that it's internally consistent and useful for other tasks - thanks for the clarification.

As for a list of vectors, because I typically deal with vectors which are not Vectors --like I support with KrylovKit.jl-- I would typically represent it as a Vector of vectors

Yeah that's always an unpleasant ambiguity of figuring out the right layout... We went with matrix because of more efficient BLAS, typically for computing overlap matrices and such.

dkarrasch commented 3 years ago

I still think the convert approach is not too bad. As @Jutho pointed out, you can clearly state there what type of output array you want (avoiding any ambiguity) and it would be a noop for matrices. If you'd expect the output to be sparse, that's the spot where you convert(SparseMatrixCSC). Also, if you convert(AbstractMatrix) and happen to get a sparse matrix as output in the full matrix case, this is also a noop. This ties in pretty well with multiple dispatch and return type control, I believe.

JeffFessler commented 3 years ago

@antoine-levitt (and anyone else who hits this thread), just FYI part of the reason I developed https://github.com/JeffFessler/LinearMapsAA.jl was to make the operators behave more like an AbstractMatrix, including making *(::LinearMap, ::AbstractMatrix) return an AbstractMatrix. There is a whole table on the README over there that tries to spell out the combinations. Under the hood it is all built on the great work in LinearMaps, just using some different conventions for some operations.

antoine-levitt commented 3 years ago

Nice! Very cool that you can just reuse the LinearMaps machinery in an outside type.

JuliaLinearAlgebra / LinearMaps.jl

Laziness of *(LinearMap, Matrix) #156