Performance: Optimize linear solve interface

ranocha commented 7 months ago

Right now, MPE uses

https://github.com/SKopecz/PositiveIntegrators.jl/blob/b821ff4664f1d2ccf983aefe9082b4140bd61269/src/mprk.jl#L329-L330

while MPRK22 still uses a simple backslash, e.g.,

https://github.com/SKopecz/PositiveIntegrators.jl/blob/b821ff4664f1d2ccf983aefe9082b4140bd61269/src/mprk.jl#L538

https://github.com/SKopecz/PositiveIntegrators.jl/blob/b821ff4664f1d2ccf983aefe9082b4140bd61269/src/mprk.jl#L548

We should use a unified interface. When doing so, we should consider the following aspects:

dolinsolve is generally faster for dense matrices and avoids some allocations
\ is currently faster for Tridiagonal matrices - but that is likely an upstream issue and should be fixed in LinearSolve.jl?
LinearSolve.jl may have some unwanted overhead since it sets up quite a lot of caches by default

For example, init can be quite slow (since it sets up a lot of unnecessary caches):

julia> using BenchmarkTools, LinearSolve, LinearAlgebra

julia> A, b = let n = 100
           Tridiagonal(rand(n - 1), 1 .+ rand(n), rand(n - 1)), rand(n)
       end;

julia> @benchmark \($A, $b)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.062 μs … 348.083 μs  ┊ GC (min … max): 0.00% … 97.92%
 Time  (median):     3.260 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.566 μs ±   5.281 μs  ┊ GC (mean ± σ):  3.57% ±  2.71%

julia> prob = LinearProblem(A, b)

julia> alg = LinearSolve.defaultalg(prob.A, prob.b, OperatorAssumptions(issquare(prob.A)))

julia> @benchmark \($A, $b)
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.057 μs … 185.594 μs  ┊ GC (min … max): 0.00% … 96.90%
 Time  (median):     3.281 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.548 μs ±   4.232 μs  ┊ GC (mean ± σ):  3.26% ±  2.72%

   ▂█▇▄▅▁                                                      
  ▂███████▅▄▃▃▃▃▃▃▃▃▃▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.06 μs         Histogram: frequency by time        5.36 μs <

 Memory estimate: 5.34 KiB, allocs estimate: 8.

julia> @benchmark init($prob, $alg)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  386.333 μs …   7.192 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     411.500 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   430.208 μs ± 139.476 μs  ┊ GC (mean ± σ):  1.86% ± 5.96%

  ▄▇█▅▂▁                                                        ▂
  ████████▆▅▅▅▅▄▅▅▄▅▅▆▆▅▅▅▅▅▃▄▆▆▅▅▆▆▅▄▅▅▃▄▅▅▃▅▅▃▅▅▃▄▃▄▃▃▅▃▄▄▄▁▄ █
  386 μs        Histogram: log(frequency) by time       1.05 ms <

 Memory estimate: 373.34 KiB, allocs estimate: 108.

julia> @benchmark solve!($(init(prob, alg)))
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.629 μs …   5.946 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.642 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.676 μs ± 218.897 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄ ▁                                                        ▁
  ████▄▁▁▁▁▁▁▃▁▃▁▄▃▁▃▁▁▁▁▁▁▁▄▁▃▁▁▃▇▅▅▆▆▆▇▆▄▆▆▁▆▄▅▃▁▁▅▃▄▃▄▃▅▄▄ █
  1.63 μs      Histogram: log(frequency) by time      2.82 μs <

 Memory estimate: 48 bytes, allocs estimate: 1.

SKopecz commented 5 months ago

Is this still an issue @ranocha ?

I cannot confirm that \ is currently faster for Tridiagonal matrices, see below.
The overhead of LinearSolve.jl is due to LinearSolve.defaultalg. Since we now explicitly set the default linear solver, this is no longer an issue.

I think we only need to decide if we want to use the function dolinsolve or another implementation.

julia> using BenchmarkTools, LinearSolve, LinearAlgebra

julia> A, b = let n = 100
                  Tridiagonal(rand(n - 1), 1 .+ rand(n), rand(n - 1)), rand(n)
              end;

julia> @benchmark \($A, $b)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.080 μs … 211.003 μs  ┊ GC (min … max): 0.00% … 97.21%
 Time  (median):     2.248 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.441 μs ±   2.962 μs  ┊ GC (mean ± σ):  2.37% ±  2.70%

    ▁▆█▇▅▃▂▁▁▂▂▁▁          ▁                             ▁    ▂
  ▆▇███████████████▆▆▇██████▇▇▇▅▄▁▄▆▅▇▇███▆▇▆▆▇▇█▇▇████▇▇██▇▇ █
  2.08 μs      Histogram: log(frequency) by time      4.08 μs <

 Memory estimate: 5.34 KiB, allocs estimate: 8.

julia> prob = LinearProblem(A, b);

julia> alg = LUFactorization()
LUFactorization{RowMaximum}(RowMaximum())

julia> @benchmark init($prob, $alg)
BenchmarkTools.Trial: 10000 samples with 162 evaluations.
 Range (min … max):  635.994 ns …   5.842 μs  ┊ GC (min … max): 0.00% … 58.72%
 Time  (median):     667.778 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   723.696 ns ± 347.267 ns  ┊ GC (mean ± σ):  6.31% ± 10.43%

  █▅▂▁                                                          ▁
  ██████▇▇▆▅▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▁▁▁▁▁▁▁▁▁▁▁▄▄▅▁▄▁▃▁▁▁▁▁▁▁▁▁▁▆██ █
  636 ns        Histogram: log(frequency) by time       3.15 μs <

 Memory estimate: 5.12 KiB, allocs estimate: 13.

julia> @benchmark solve!($(init(prob, alg)))
BenchmarkTools.Trial: 10000 samples with 21 evaluations.
 Range (min … max):  958.952 ns …  2.026 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     960.381 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   991.783 ns ± 98.342 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █       ▁       ▁  ▂   ▁    ▁   ▁    ▁    ▁          ▁       ▁
  █▃▇▇▅█████▄▃█▅▄▄█▄▁█▇▃▁█▅▃▁▁█▁▄▁█▄▁▁▁█▄▁▁▃█▁▁▁▁█▃▁▁▁▁█▁▃▁▁▁▇ █
  959 ns        Histogram: log(frequency) by time      1.46 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

ranocha commented 5 months ago

Is this still an issue @ranocha ?

I cannot confirm that \ is currently faster for Tridiagonal matrices, see below.

The overhead of LinearSolve.jl is due to LinearSolve.defaultalg. Since we now explicitly set the default linear solver, this is no longer an issue.

Yeah, it's basically fixed with https://github.com/SKopecz/PositiveIntegrators.jl/pull/60.

I think we only need to decide if we want to use the function dolinsolve or another implementation.

I agree, that's the last remaining decision. dolinsolve sets the preconditioners (requiring us to add the ugly hacks for them since we don't use them right now) and does a bunch of additional checks that we don't need for PositiveIntegrators.jl. I'm fine either way, though. What's your take on this - shall we just close this issue and continue using dolinsolve?

SKopecz commented 5 months ago

OK, let's stick with dolinsolve for now.

SKopecz / PositiveIntegrators.jl

Performance: Optimize linear solve interface #59