SciML / LinearSolve.jl

LinearSolve.jl: High-Performance Unified Interface for Linear Solvers in Julia. Easily switch between factorization and Krylov methods, add preconditioners, and all in one interface.
https://docs.sciml.ai/LinearSolve/stable/
Other
241 stars 52 forks source link

Cache initialization of default algorithms #442

Closed SKopecz closed 8 months ago

SKopecz commented 8 months ago

The following example demonstrates for StaticArrays that DefaultAlgorithmChoice.LUFactorization is significantly slower than explicitly requesting an LUFactorization. The main reason for this is that in https://github.com/SciML/LinearSolve.jl/blob/e60a10a2e51c4d675cfe15dafaef1f43cd709d97/src/default.jl#L296 caches for all possible default algorithms are initialized and not just for the algorithm actually used. The flame graph below shows that over 40% of the time is unnecessarily spent in init_cacheval initializing GMRES, Cholesky-Factorization, SVD and so on. The problem also exists with other types of A and b, albeit to a lesser extent.

It would be great if only the cache for the actual algorithm could be initialized.

I would also be interested to know why a distinction is made between DefaultAlgorithmChoice.LUFactorization and LUFactorization at all.

julia> using StaticArrays, LinearSolve, LinearAlgebra, BenchmarkTools

julia> A = @SMatrix [1.0 2.0; 3.0 4.0];
julia> b = @SVector [3.0; 7.0];
julia> prob = LinearProblem(A,b);

julia> sol1 = solve(prob);
julia> sol1.alg
LinearSolve.DefaultLinearSolver(LinearSolve.DefaultAlgorithmChoice.LUFactorization)

julia> solver = LUFactorization();
julia> sol2 = solve(prob, solver);
julia> sol2.alg
LUFactorization{RowMaximum}(RowMaximum())

julia> @btime solve($prob)
  3.637 μs (41 allocations: 4.27 KiB)
retcode: Default
u: 2-element Vector{Float64}:
 0.9999999999999997
 1.0000000000000002

julia> @btime solve($prob, $solver)
  84.237 ns (2 allocations: 288 bytes)
retcode: Default
u: 2-element Vector{Float64}:
 0.9999999999999997
 1.0000000000000002

grafik

CC @ranocha

ChrisRackauckas commented 8 months ago

It's done in that way because with arrays it would be type-unstable to choose the method based on size and such. But with StaticArrays, you have the information. I think we could specialize SArray/MArray directly to LUFactorization/SVDFactorization based on size @avik-pal ?

avik-pal commented 8 months ago

Ah that is the reason to create a cache for all types. Yeah I will get it fixed.

avik-pal commented 8 months ago

Should be fixed now

SKopecz commented 8 months ago

Great, thanks!