JuliaLinearAlgebra / LinearMaps.jl

A Julia package for defining and working with linear maps, also known as linear transformations or linear operators acting on vectors. The only requirement for a LinearMap is that it can act on a vector (by multiplication) efficiently.
Other
303 stars 42 forks source link

Unnecessary allocation for mat-vec multiplication of composite map #193

Closed felixhorger closed 1 year ago

felixhorger commented 1 year ago

Hi, very useful package! I work with huge vectors (~30Gb) and encountered a problem with composite maps. (A * B) * x gives the same as A * (B * x) as expected, but the former allocates twice the space. A and B are of type FunctionMap and have their own output arrays, e.g. A = LinearMap{T}(x -> zeros(42), 42). In such a case, the output is copied to another allocated array, see this function. In my case that meant that around 50% of the time to compute (A * B) * x was spend on allocating memory or copying. Is there any way to prevent this? A couple of my naive attempts to solve this failed. In above example I could use the ismutating option, but my case is more complicated and that doesn't work anymore.

Cheers, Felix

dkarrasch commented 1 year ago

Hi @felixhorger! Thanks for the report. I believe the relevant piece of code for you is

https://github.com/JuliaLinearAlgebra/LinearMaps.jl/blob/bd10c6c74ada158d6b820694ccde21235ed01538/src/composition.jl#L176-L182

As you say, when you have a non-mutating LinearMap, then _unsafe_mul! essentially computes B*x, then copies the result to source (which was created when calling the function), and then computes the second factor A*source, which is only then copied to the final y. A quickfix could be to overload the "two-map" _composite_mul!. The issue is quite generic for outofplace FunctionMaps, because every call to _unsafe_mul! will create twice the necessary memory.

Perhaps a more sustainable solution is to make FunctionMap have a type parameter for whether it's an inplace-function or not. Then one could define a mul!(y, A::CompositeMap{<:Any,Tuple{Vararg{FunctionMap{<:Any,false}}}}, x) method that just computes A[i] * x consecutively. Or define a restricted LinearMap type that doesn't even define mul!, but only *.

Which other features do you need for your map, beyond map-vec-multiplication?

felixhorger commented 1 year ago

Apologies for the late reply, a conference deadline kept me from pursuing this! Yes that would help! Basically I have a set of operators A, B, ... which are multiplied to form one composite map A B ... . They are mutating in that they either allocated their own memory internally, i.e. write into the same output array, or mutate the input array. This is very bad practice (massive side effects ...) but the only way to speed things up (> 30GB arrays). I can't declare these as mutating, because then the input vector is copied which defeats the purpose.

I think the methods being called must be changed, i.e. if an operator is not mutating, _unsafe_mul! is not used (because if would double-allocate and copy). Even in a side-effect free code this avoids one allocation+copy so would be beneficial.

dkarrasch commented 1 year ago

Can you try this with the current master branch? If #194 improved things, we should perhaps release it rather sooner than later.

felixhorger commented 1 year ago

Thanks for working towards a solution! I tried it on v3.10.0-DEV but it still allocates, here is a MWE to reproduce

using LinearMaps

N = 100_000_000
function plan()
    y = zeros(N)
    A = LinearMap{Float64}(
        x -> begin
            y .= x
            y
        end,
        N
    )
    return A, y
end

# Set up operators
A, ya = plan()
B, yb = plan()
x = zeros(N)

# Please run each twice to compile
@time u = (A * B) * x; # Allocates
@time v = A * (B * x); # doesn't, but is equivalent

@assert u === ya # Fails, meaning that the vector returned from the composite isn't the value returned from A, but a copy
@assert v === ya # is true

I should add that this code has a massive side-effect (y) but that's required for my application of having exceedingly large vectors (> 30GB).

dkarrasch commented 1 year ago

Aha, I missed a detail. Should be fixed by #200.

felixhorger commented 1 year ago

Yes, that works like a charm, thanks! :)

When I test it with the above snippet, the @assert statements are both true and I can only observe a small 80 bytes allocation when using the composite which is acceptable.