JuliaLinearAlgebra / SemiseparableMatrices.jl

A Julia package to represent semiseparable and almost banded matrices
MIT License
9 stars 4 forks source link

Poor scaling of matrix multiplication for `UpperTriangular(AlmostBandedMatrix)` #50

Open DanielVandH opened 2 months ago

DanielVandH commented 2 months ago

Encountered this case where the performance of multiplication seemed to have really poor scaling:

using FillArrays, SemiseparableMatrices, BandedMatrices, LazyArrays
function gen_example(n)
    A = BandedMatrix(0 => [1; 1:n-1], 1 => Fill(-1.0, n - 1))
    B = ApplyMatrix(*, ones(n), OneElement(1.0, (1, n), (1, n)))
    AB = AlmostBandedMatrix(A, B)
    U = UpperTriangular(AB)
    L = LowerTriangular(brand(n, n, 2, 0))
    stats = @timed U * L
    return stats.time, stats.bytes
end
n = 2 .^ (0:11)
time_byts = gen_example.(n)
12-element Vector{Tuple{Float64, Int64}}:
 (2.7e-6, 64)
 (3.0e-7, 96)
 (1.4e-6, 704)
 (2.7e-6, 3648)
 (1.44e-5, 16512)
 (9.14e-5, 69760)
 (0.0005424, 286768)
 (0.0045723, 1163312)
 (0.0310835, 4685872)
 (0.240178, 18808880)
 (1.9579295, 75366448)
 (41.1409489, 301727792)

If I replace the stats = ... line with literally stats = @timed Matrix(U) * Matrix(L) so that all the structure is lost,

12-element Vector{Tuple{Float64, Int64}}:
 (0.0068448, 192)
 (0.0004204, 288)
 (6.15e-5, 704)
 (1.53e-5, 2112)
 (1.39e-5, 7424)
 (0.0007202, 26880)
 (0.0003808, 102416)
 (0.0053296, 401424)
 (0.0047358, 1589264)
 (0.0082311, 6324240)
 (0.0461804, 25231376)
 (2.2484057, 100794384)

If I instead keep stats = @timed U * L but do AB = Matrix(A + B) then

12-element Vector{Tuple{Float64, Int64}}:
 (0.0019402, 64)
 (3.8e-6, 96)
 (1.0e-6, 192)
 (1.7e-6, 576)
 (3.5e-6, 2176)
 (0.0004226, 8320)
 (0.000128, 32816)
 (0.0002281, 131120)
 (0.0006641, 524336)
 (0.0032373, 2097200)
 (0.0220077, 8388656)
 (0.1559884, 33554480)

If I do AB = A + B, the timing is instant since the multiplication is left lazy.

DanielVandH commented 2 months ago

Actually, the LowerTriangular can be removed, i.e. use L = brand(n, n, 2, 0) instead, and the issue still shows. So I guess the issue is the UpperTriangular wrapper on an AlmostBandedMatrix.

I tried tracing the call enough to find the cause but I can't seem to identify it. Any ideas?