CholeskyPreconditioner is much slower than unwrapped LLDL

johnbcoughlin commented 2 years ago

The CholeskyPreconditioner wrapper is about 600 times slower on a moderately sized problem:

N = 4096;
A = spdiagm(
  -1 => fill(-1.0, N - 1),
   0 => fill(2.0, N),
   1 => fill(-1.0, N - 1)
);
b = rand(N);
p = lldl(A, memory=2);
pslow = CholeskyPreconditioner(A, 2);

> @btime $p \ $b
  36.791 μs (3 allocations: 32.09 KiB)

> @btime $pslow \ $b
  24.392 ms (2 allocations: 32.08 KiB)

It seems that wrapping its L factor in a LowerTriangular type is not the way to go:

julia> @which p.L \ b
\(A::SparseArrays.AbstractSparseMatrixCSC, B::AbstractVecOrMat{T} where T) in SparseArrays at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/SparseArrays/src/linalg.jl:1551

julia> @which pslow.L \ b
\(A::Union{LowerTriangular, UpperTriangular}, B::AbstractVector{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/triangular.jl:1668

Any reason for wrapping it like this, or should it just be removed?

johnbcoughlin commented 2 years ago

I should note, I'm also observing that the CholeskyPreconditioner object requires more conjugate gradient iterations to converge on this problem. It suggests that the wrapper may be implementing the preconditioner interface incorrectly?

using IterativeSolvers
x, hist = cg(A, b, Pl=pslow, log=true, verbose=true)
# 102 iterations
x, hist = cg(A, b, Pl=p, log=true, verbose=true)
# 32 iterations

mohamed82008 commented 2 years ago

Thanks for the detailed report! Looks like LimitedLDLFactorizations got an efficient implementation for sparse lower triangular solves. I changed the implementation of CholeskyPreconditioner to become a light wrapper of LimitedLDLFactorizations instead to make use of their efficient implementations. https://github.com/mohamed82008/Preconditioners.jl/pull/24

JuliaLinearAlgebra / Preconditioners.jl

CholeskyPreconditioner is much slower than unwrapped LLDL #23