Guaranteed memory usage (in GMRES)

haampie commented 7 years ago

The current implementation of GMRES allocates too much memory. We should be able to guarantee that memory usage is fixed as O(restart * n), where n is the height of the matrix.

As a reference, I've coded up an improved GMRES version. The main difference is that it pre-allocates the V matrix and reuses that without throwing vectors away. Its implementation is super basic, it does not have preconditioners and does not factorize the Hessenberg matrix.

Then I benchmarked against the version of IterativeSolvers.jl. GMRES does not converge for the specific matrix, so that you can compare results for a large number of iterations.

For n = 5000 and 1500 iterations with restarts after 15 iterations, this is what I get:

IterativeSolvers.jl

BenchmarkTools.Trial:
  memory estimate:  423.87 MiB
  allocs estimate:  98640
  --------------
  minimum time:     579.899 ms (3.20% GC)
  median time:      581.539 ms (3.19% GC)
  mean time:        581.258 ms (3.24% GC)
  maximum time:     582.504 ms (3.41% GC)
  --------------
  samples:          9
  evals/sample:     1

Improved version

BenchmarkTools.Trial:
  memory estimate:  13.54 MiB
  allocs estimate:  20492
  --------------
  minimum time:     479.069 ms (0.00% GC)
  median time:      481.632 ms (0.31% GC)
  mean time:        481.409 ms (0.17% GC)
  maximum time:     482.786 ms (0.31% GC)
  --------------
  samples:          11
  evals/sample:     1

So, memory usage is reduced a lot, just like the number of allocations. Maybe it can be improved upon.

Surprisingly it is faster as well... Maybe factorizing the Hessenberg matrix on the go is slower than a simple \.

With a smaller matrix I got some more samples (~10000):

> current, improved = benchmark(n = 500)
> judge(median(improved), median(current))
BenchmarkTools.TrialJudgement:
  time:   -41.32% => improvement (5.00% tolerance)
  memory: -73.43% => improvement (1.00% tolerance)

TL;DR:

Pre-allocate the V matrix and use indices to reduce the number of allocations
Maybe don't even do the Hessenberg QR-decomposition during the iterations

haampie commented 7 years ago

The same is basically true for cg.jl. Test with a sparse matrix of size 10.000, converging in 194 iterations:

Current

BenchmarkTools.Trial:
  memory estimate:  15.30 MiB
  allocs estimate:  1026
  --------------
  minimum time:     562.136 ms (0.00% GC)
  median time:      575.174 ms (0.46% GC)
  mean time:        576.091 ms (0.32% GC)
  maximum time:     592.723 ms (0.53% GC)
  --------------
  samples:          9
  evals/sample:     1

Improved

BenchmarkTools.Trial:
  memory estimate:  312.97 KiB
  allocs estimate:  11
  --------------
  minimum time:     552.034 ms (0.00% GC)
  median time:      565.630 ms (0.00% GC)
  mean time:        562.870 ms (0.00% GC)
  maximum time:     572.558 ms (0.00% GC)
  --------------
  samples:          9
  evals/sample:     1

BenchmarkTools.TrialJudgement:
  time:   -1.66% => invariant (5.00% tolerance)
  memory: -98.00% => improvement (1.00% tolerance)

Hope to resolve this soon.

ChrisRackauckas commented 7 years ago

It's not showing a timing change probably because the test problem is small. With a larger test problem, memory differences this large would lead to huge timing differences.

JuliaLinearAlgebra / IterativeSolvers.jl

Guaranteed memory usage (in GMRES) #127