CG preconditioned by AMG (CG-AMG) using AMGCL

ddemidov / amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method

http://amgcl.readthedocs.org/

MIT License

723 stars 110 forks source link

CG preconditioned by AMG (CG-AMG) using AMGCL #114

Closed davidherreroperez closed 4 years ago

davidherreroperez commented 5 years ago

Hello,

I'm trying to implement a CG preconditioned by AMG (CG-AMG) as shown in Figures 17 and 18 of the following paper

https://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/NaArCa_15AmgX.pdf

in order to reduce the number of iterations taken to converge the CG solver. I would be very grateful if you can indicate me if the CG implementation can be used without applying any preconditioning in each iteration of the solver ( P.apply(r, s); in line 160 of amgcl/solver/cg.hpp ). I would like to use the AMGCL implementation to take advantage of parallelization functionalities.

Thanks in advance!

ddemidov commented 5 years ago

There is a 'dummy' preconditioner (apply is a no-op) designed specifically for this: https://github.com/ddemidov/amgcl/blob/master/amgcl/preconditioner/dummy.hpp.

You can use it in your code directly, or with a 'runtime' preconditioner, or just use examples/solver.cpp. The following examples solve a 3D Poisson problem on a 80^3 grid, but you could also provide a system matrix in the matrix market format:

$ ./examples/solver -n 80 -p solver.type=cg precond.class=amg
Solver
======
Type:             CG
Unknowns:         512000
Memory footprint: 15.62 M

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.62
Grid complexity:     1.13
Memory footprint:    180.50 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       512000        3545600    134.56 M (61.85%)
    1        64560        1903196     40.72 M (33.20%)
    2         4185         270731      4.89 M ( 4.72%)
    3          237          13051    336.19 K ( 0.23%)

Iterations: 17
Error:      5.50212e-09

[Profile:          1.380 s] (100.00%)
[ self:            0.004 s] (  0.27%)
[  assembling:     0.053 s] (  3.85%)
[  setup:          0.421 s] ( 30.48%)
[  solve:          0.903 s] ( 65.39%)

$ ./examples/solver -n 80 -p solver.type=cg solver.maxiter=1000 precond.class=dummy
Solver
======
Type:             CG
Unknowns:         512000
Memory footprint: 15.62 M

Preconditioner
==============
identity matrix as preconditioner
  unknowns: 512000
  nonzeros: 3545600

Iterations: 199
Error:      8.76374e-09

[Profile:          2.572 s] (100.00%)
[ self:            0.004 s] (  0.14%)
[  assembling:     0.059 s] (  2.29%)
[  setup:          0.020 s] (  0.78%)
[  solve:          2.490 s] ( 96.79%)

davidherreroperez commented 5 years ago

Many thanks for the prompt reply!

I can find in the literature that one of the advantages of using CG iterative method with multigrid preconditioner is that the number of iterations does not increase even when the mesh is finer, for instance

http://www.hpcs.cs.tsukuba.ac.jp/~tatebe/research/paper/CM93-tatebe.pdf

I'm doing many experiments in the context of linear elasticity with finite elements of order 1 modifying/refining the mesh and also solving the same problems using hypre, and these are the results

problem_size iters_1 iters_2 iters_hypre 650 1 20 13 2450 1 24 14 9506 304 35 15 37442 596 44 16 148610 1189 73 16 592130 2368 111 17

Experiment 1: AMGCL using "./solver --matrix A'SIZE'.mtx --rhs b'SIZE'.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.class=amg" Experiment 2: AMGCL using "./solver --matrix A'SIZE'.mtx --rhs b'SIZE'.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.class=amg precond.coarse_enough=25 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=gaussseidel" Experiment 3: HYPRE CG preconditioned with AMG with the configuration indicated in the log files hypre'SIZE'.txt

https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0

Experiment 2 aims to emulate the configuration used by hypre in experiment 3.

I would be very grateful if you can say me if I'm not configuring properly a CG method preconditioned by AMG using AMGCL, or the reasons the number of iterations of CG method increase with the mesh size.

Thanks in advance!

ddemidov commented 5 years ago

It looks like the problem is either extremely anisotropic or convection-dominated (hence the need to set eps_strong=0.9), and thus is not suited very well for smoothed aggregation coarsening used in the first experiment.

Ruge-Stuben coarsening usually works better with such problems, and indeed, the convergence with your second set of parameters looks good enough to me. You could somewhat improve it by dropping precond.coarse_enough=25 (so that direct solver is applied at the finer level) and replacing gauss_seidel relaxation with ilu0:

$ ./solver -B -A A_37442.bin -f b_37442.bin -p solver.type=cg \
    precond.coarsening.type=ruge_stuben \
    precond.coarsening.eps_strong=0.9 \
    precond.relax.type=ilu0
...
Iterations: 20
Error:      4.72897e-09
...

$ ./solver -B -A A_592130.bin -f b_592130.bin -p solver.type=cg \
    precond.coarsening.type=ruge_stuben \
    precond.coarsening.eps_strong=0.9 \
    precond.relax.type=ilu0
...
Iterations: 70
Error:      7.93614e-09
...

What is the kind of problem you are solving here? Knowing this could help to further improve convergence.

I have to say I am impressed with hypre performance here, which seems to be related to the HMIS coarsening they use (unavailable in amgcl).

davidherreroperez commented 5 years ago

Thanks for your comments!

What is the kind of problem you are solving here? Knowing this could help to further improve convergence.

I'm solving a plane strain linear elasticity problem. The model is a 2d cantilever with structured mesh and constant material properties. All the displacements of the left side are constrainted and a uniform vertical load is applied to the right side. The resulting displacements are as follows

scantilever2d

I have to say I am impressed with hypre performance here, which seems to be related to the HMIS coarsening they use (unavailable in amgcl).

Just to mention that I was testing AMGX with HMIS coarsening and the number of iterations also increase with the mesh refinement, but the problem could be related to other reason ...

ddemidov commented 5 years ago

I'm solving a plane strain linear elasticity problem.

Does that mean you have two unknowns per grid point? In this case the matrix should have block structure with 2x2 blocks, but it does not look that way. Do you order unknowns by grid points or by direction (x or y)? In the first case, we could either use point-wise aggregation coarsening or use block-valued matrix and rhs.

Also, providing near null space vectors (rigid body modes) usually helps with elasticity problems and aggregation coarsening.

ddemidov commented 5 years ago

After a second look at the matrix structure it does look like the unknowns are ordered by axis (x or y) rather than by grid point. Is it possible to switch the ordering?

davidherreroperez commented 5 years ago

Hi Denis

After a second look at the matrix structure it does look like the unknowns are ordered by axis (x or y) rather than by grid point. Is it possible to switch the ordering?

I have updated the dropbox link

https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0

including two folders: ordered_by_nodes and ordered_by_dim.

ordered_by_nodes: loop first over the nodes (inner loop) then over the vector dimension (outer loop); symbolically it can be represented as: XXX...,YYY...

ordered_by_dim: loop first over the vector dimension (inner loop) then over the nodes (outer loop); symbolically it can be represented as: XY,XY,XY,...

Taking a look at the experiments showed above using Hypre, I have to say that I used the ordered_by_dim format including the dimension of the problem using the HYPRE_BoomerAMGSetNumFunctions(dim) option ...

ddemidov commented 5 years ago

Using the information about block structure somewhat helps. Here is non-smoothed aggregation with node-wise coarsening:

./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
  precond.relax.type=ilu0 precond.coarsening.type=aggregation \
  precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.79
Grid complexity:     1.65
Memory footprint:    44.13 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442         667012     23.86 M (55.76%)
    1        18332         366176     13.03 M (30.61%)
    2         4978         132874      4.55 M (11.11%)
    3         1078          30092      2.70 M ( 2.52%)

Iterations: 346
Error:      9.31048e-09

[Profile:       3.325 s] (100.00%)
[  reading:     0.449 s] ( 13.52%)
[  setup:       0.085 s] (  2.56%)
[  solve:       2.790 s] ( 83.91%)

and here is the case when the system matrix is treated as having 2x2 block values:

./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
  precond.relax.type=ilu0 precond.coarsening.type=aggregation -b2
Solver
======
Type:             CG
Unknowns:         18721
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.08
Memory footprint:    58.89 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        18721         482505     40.12 M (93.22%)
    1         1470          35096     18.77 M ( 6.78%)

Iterations: 240
Error:      8.10336e-09

[Profile:       3.211 s] (100.00%)
[  reading:     0.449 s] ( 13.97%)
[  setup:       0.255 s] (  7.95%)
[  solve:       2.506 s] ( 78.06%)

Still, the Ruge-Stuben coarsening seems to be optimal (among supported by amgcl) for this kind of problem.

ddemidov commented 5 years ago

Sorry, I've looked at the wrong matrices. The correct set to use with block-wise coarsening/block valued matrix is the ordered_by_dim one (XY,XY):

./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
  precond.coarsening.type=aggregation precond.relax.type=ilu0 \
  precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    26.26 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442         667012     23.82 M (94.06%)
    1         2450          42148      2.44 M ( 5.94%)

Iterations: 81
Error:      9.31612e-09

[Profile:       0.814 s] (100.00%)
[  reading:     0.448 s] ( 55.11%)
[  setup:       0.036 s] (  4.47%)
[  solve:       0.328 s] ( 40.36%)

./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
  precond.coarsening.type=aggregation precond.relax.type=ilu0 -b2
Solver
======
Type:             CG
Unknowns:         18721
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    18.25 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        18721         166753     15.74 M (94.06%)
    1         1225          10537      2.52 M ( 5.94%)

Iterations: 82
Error:      7.50392e-09

[Profile:       0.724 s] (100.00%)
[  reading:     0.492 s] ( 67.88%)
[  setup:       0.021 s] (  2.85%)
[  solve:       0.211 s] ( 29.19%)

Performance-wise, these are almost compatible with Ruge-Stuben (also, note the difference in reported memory footprint; the RS is much more expensive here):

./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
  precond.coarsening.{type=ruge_stuben,eps_strong=0.9} \
  precond.relax.type=ilu0
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    5
Operator complexity: 2.03
Grid complexity:     1.92
Memory footprint:    50.06 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442         667012     24.48 M (49.36%)
    1        18426         326064     11.99 M (24.13%)
    2         9211         197059      7.10 M (14.58%)
    3         4506         105972      3.73 M ( 7.84%)
    4         2167          55135      2.76 M ( 4.08%)

Iterations: 21
Error:      6.15272e-09

[Profile:       0.674 s] (100.00%)
[  reading:     0.443 s] ( 65.69%)
[  setup:       0.078 s] ( 11.64%)
[  solve:       0.152 s] ( 22.59%)

ddemidov commented 5 years ago

Final note: there is no need to use non-smoothed aggregation either:

./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
  precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    30.21 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442         667012     27.77 M (94.03%)
    1         2450          42340      2.44 M ( 5.97%)

Iterations: 65
Error:      9.72811e-09

[Profile:       0.886 s] (100.00%)
[  reading:     0.455 s] ( 51.38%)
[  setup:       0.049 s] (  5.49%)
[  solve:       0.382 s] ( 43.07%)

./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
  precond.relax.type=ilu0 -b2
Solver
======
Type:             CG
Unknowns:         18721
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    20.02 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        18721         166753     17.51 M (94.03%)
    1         1225          10585      2.52 M ( 5.97%)

Iterations: 77
Error:      7.52487e-09

[Profile:       0.817 s] (100.00%)
[  reading:     0.477 s] ( 58.43%)
[  setup:       0.027 s] (  3.27%)
[  solve:       0.312 s] ( 38.23%)

davidherreroperez commented 5 years ago

Thanks for your comments!

Just one last question. When I use block_size>1 I get the following runtime error

terminate called after throwing an instance of 'std::runtime_error' what(): Unsupported block size

coming from line 354 of https://github.com/ddemidov/amgcl/blob/master/examples/solver.cpp

How can I compile with support for block_size>1?

ddemidov commented 5 years ago

You should define AMGCL_BLOCK_SIZES macro to something like (2)(3)(4). If you are using cmake to compile amgcl examples, then define cmake variable -DAMGCL_BLOCK_SIZES=2;3;4 to the same effect. The default is 3;4:

https://github.com/ddemidov/amgcl/blob/15df1d7be117fcd86ade3e9e7328417674401af7/examples/CMakeLists.txt#L12-L17

ddemidov commented 5 years ago

Had some time today to look at the largest matrix from your set. Smoothed aggregation coarsening from amgcl manages to converge in the same time HMIS from Hypre does (keeping in mind we ran the tests on different machines; mine has an Intel Core i7 920 CPU @ 2.67GHz). It makes more iterations, but each iteration is cheaper, since the hiererachy is much lighter memory-wise:

$ ./solver -B -A ./problems/issue-114/A_592130.bin -f ./problems/issue-114/b_592130.bin \
  -p solver.type=cg solver.maxiter=300 precond.relax.type=ilu0 -b2
Solver
======
Type:             CG
Unknowns:         296065
Memory footprint: 18.07 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.07
Grid complexity:     1.07
Memory footprint:    298.57 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       296065        2657665    278.53 M (93.74%)
    1        18721         166753     17.52 M ( 5.88%)
    2         1225          10585      2.52 M ( 0.37%)

Iterations: 110
Error:      8.61486e-09

[Profile:      10.466 s] (100.00%)
[ self:         0.011 s] (  0.10%)
[  reading:     0.151 s] (  1.44%)
[  setup:       0.640 s] (  6.11%)
[  solve:       9.665 s] ( 92.35%)

Compare this to Hypre numbers from your report (3.95s for setup; 9.33 for solve in 17 iterations). Also, we can use a GPU with amgcl:

$ OCL_DEVICE=K40 ./solver_vexcl_cl -B \
  -A ./problems/issue-114/A_592130.bin -f ./problems/issue-114/b_592130.bin \
  -p solver.type=cg solver.maxiter=300 precond.relax.type=ilu0 -b2
1. Tesla K40c (NVIDIA CUDA)

Solver
======
Type:             CG
Unknowns:         296065
Memory footprint: 18.07 M

Preconditioner
==============
Number of levels:    3
Operator complexity: 1.07
Grid complexity:     1.07
Memory footprint:    303.34 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0       296065        2657665    283.00 M (93.74%)
    1        18721         166753     17.79 M ( 5.88%)
    2         1225          10585      2.56 M ( 0.37%)

Iterations: 115
Error:      8.43753e-09

[Profile:       2.679 s] (100.00%)
[ self:         0.200 s] (  7.48%)
[  reading:     0.145 s] (  5.41%)
[  setup:       0.866 s] ( 32.31%)
[  solve:       1.468 s] ( 54.80%)

davidherreroperez commented 4 years ago

Hi Denis,

I have observed a significant increment in the number of iterations using mpi_amg as the number of non-zeros per row increases. In order to reproduce the problem, I have increased the degree of the finite elements (p-method) of this model

I'm solving a plane strain linear elasticity problem. The model is a 2d cantilever with structured mesh and constant material properties. All the displacements of the left side are constrainted and a uniform vertical load is applied to the right side.

combined with the refining of the mesh (h-method). The coefficient matrices (A2) and rhs vectors (b2) are located at

https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0

We can compare the results with two problems of 37442 unknows (18432 vs 4608 finite elements) with different sparsity of the coefficient matrix:

./solver -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
======
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    30.21 M

level     unknowns       nonzeros      memory
   0        37442         667012     27.77 M (94.03%)
   1         2450          42340      2.44 M ( 5.97%)

Iterations: 65
Error:      9.72811e-09

[Profile:       0.922 s] (100.00%)
[  reading:     0.546 s] ( 59.21%)
[  setup:       0.050 s] (  5.44%)
[  solve:       0.326 s] ( 35.30%)

./solver -A ordered_by_dim/A2_37442.mtx -f ordered_by_dim/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    48.92 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     44.29 M (93.27%)
    1         2398          85516      4.63 M ( 6.73%)

Iterations: 4619
Error:      9.98768e-09

[Profile:      38.110 s] (100.00%)
[  reading:     0.945 s] (  2.48%)
[  setup:       0.122 s] (  0.32%)
[  solve:      37.042 s] ( 97.20%)

./solver -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 -b2
Solver
======
Type:             CG
Unknowns:         18721
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.06
Grid complexity:     1.07
Memory footprint:    20.02 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        18721         166753     17.51 M (94.03%)
    1         1225          10585      2.52 M ( 5.97%)

Iterations: 77
Error:      7.52487e-09

[Profile:       0.857 s] (100.00%)
[  reading:     0.558 s] ( 65.16%)
[  setup:       0.036 s] (  4.18%)
[  solve:       0.262 s] ( 30.61%)

./solver -A ../mtx/ordered_by_dim/A2_37442.mtx -f ../mtx/ordered_by_dim/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=10000 precond.relax.type=ilu0 -b2
Solver
======
Type:             CG
Unknowns:         18721
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    33.16 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        18721         296065     27.85 M (93.24%)
    1         1199          21469      5.30 M ( 6.76%)

Iterations: 10000
Error:      13.9903

[Profile:      74.239 s] (100.00%)
[  reading:     0.967 s] (  1.30%)
[  setup:       0.094 s] (  0.13%)
[  solve:      73.178 s] ( 98.57%)

./solver -A ordered_by_nodes/A_37442.mtx -f ordered_by_nodes/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=ilu0
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    5
Operator complexity: 2.03
Grid complexity:     1.92
Memory footprint:    50.06 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442         667012     24.48 M (49.36%)
    1        18426         326064     11.99 M (24.13%)
    2         9211         197059      7.10 M (14.58%)
    3         4506         105972      3.73 M ( 7.84%)
    4         2167          55135      2.76 M ( 4.08%)

Iterations: 21
Error:      6.15272e-09

[Profile:       0.804 s] (100.00%)
[  reading:     0.552 s] ( 68.68%)
[  setup:       0.086 s] ( 10.67%)
[  solve:       0.166 s] ( 20.60%)

./solver -A ordered_by_nodes/A2_37442.mtx -f ordered_by_nodes/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=ilu0
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    4
Operator complexity: 1.85
Grid complexity:     1.62
Memory footprint:    73.53 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     40.37 M (54.16%)
    1        13846         662934     21.70 M (30.32%)
    2         6904         285918      9.47 M (13.07%)
    3         2305          53659      1.98 M ( 2.45%)

Iterations: 242
Error:      8.07748e-09

[Profile:       4.089 s] (100.00%)
[  reading:     0.944 s] ( 23.09%)
[  setup:       0.170 s] (  4.15%)
[  solve:       2.974 s] ( 72.75%)

I know that the number of iterations can increase as the sparsity also do it since the condition number is different, but they seem a lot of iterations. It does not converge using block values, I am probably doing something wrong. I solve this problem (A2_37442 and b2_37442) using a cg with Jacobi preconditioning taking 752 iterations. I would appreciate your opinion about this issue.

ddemidov commented 4 years ago

Looks like ilu0 is behaving badly in this case. spai0 fixes the convergence:

./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=200} \
    precond.relax.type=spai0 precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    30.25 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     25.62 M (93.27%)
    1         2398          85516      4.63 M ( 6.73%)

Iterations: 104
Error:      8.10064e-09

[Profile:       0.953 s] (100.00%)
[  reading:     0.021 s] (  2.23%)
[  setup:       0.107 s] ( 11.20%)
[  solve:       0.824 s] ( 86.48%)

Not sure what happens with ilu0, but spai0 is much cheaper to setup and scales better across multiple cores/mpi processes, so if it works, I would just use it.

ddemidov commented 4 years ago

Another option is ilu(k) (with k=1 by default):

./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=200} \
    precond.relax.type=iluk precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    90.50 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     85.87 M (93.27%)
    1         2398          85516      4.63 M ( 6.73%)

Iterations: 40
Error:      4.9139e-09

[Profile:       2.943 s] (100.00%)
[  reading:     0.022 s] (  0.74%)
[  setup:       2.182 s] ( 74.15%)
[  solve:       0.738 s] ( 25.08%)

ddemidov commented 4 years ago

And just to check that ilu0 is working as intended, I can reproduce bad convergence with iluk, k=0:

./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=5000} \
    precond.relax.{type=iluk,k=0} precond.coarsening.aggr.block_size=2
Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    48.92 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     44.29 M (93.27%)
    1         2398          85516      4.63 M ( 6.73%)

Iterations: 4619
Error:      9.98768e-09

[Profile:      54.090 s] (100.00%)
[  reading:     0.021 s] (  0.04%)
[  setup:       0.377 s] (  0.70%)
[  solve:      53.691 s] ( 99.26%)

davidherreroperez commented 4 years ago

Not sure what happens with ilu0, but spai0 is much cheaper to setup and scales better across multiple cores/mpi processes, so if it works, I would just use it.

Thank you Dennis for the suggestion. It using spai0 the cg converges and it scales better across multiple mpi processes.

Another option is ilu(k) (with k=1 by default):

Just to mention that iluk (with k=1 by default) is not working properly using solver_vexcl_cl and solver_vexcl_cuda.

ddemidov commented 4 years ago

Just to mention that iluk (with k=1 by default) is not working properly using solver_vexcl_cl and solver_vexcl_cuda.

ILU* methods in AMGCL are applied approximately when on a GPGPU backend. That is, instead of doing a forward and backward sweep which has very low parallelism, a couple of Jacobi iterations are used to approximately solve the triangular systems. The default number of iterations is 2, but you can increase the number with precond.relax.solve.iters parameter:

Default number of iterations:

./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
    precond.relax.type=iluk precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)

Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    90.48 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     85.81 M (93.27%)
    1         2398          85516      4.66 M ( 6.73%)

Iterations: 100
Error:      163.396

[Profile:       5.688 s] (100.00%)
[ self:         1.171 s] ( 20.59%)
[  reading:     0.414 s] (  7.29%)
[  setup:       2.785 s] ( 48.97%)
[  solve:       1.317 s] ( 23.16%)

Increased number of iterations

./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
    precond.relax.{type=iluk,solve.iters=5} precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)

Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    90.48 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     85.81 M (93.27%)
    1         2398          85516      4.66 M ( 6.73%)

Iterations: 51
Error:      9.58117e-09

[Profile:       2.854 s] (100.00%)
[ self:         0.195 s] (  6.85%)
[  reading:     0.018 s] (  0.62%)
[  setup:       2.155 s] ( 75.53%)
[  solve:       0.485 s] ( 17.00%)

even more:

./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
    precond.relax.{type=iluk,solve.iters=7} precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)

Solver
======
Type:             CG
Unknowns:         37442
Memory footprint: 1.14 M

Preconditioner
==============
Number of levels:    2
Operator complexity: 1.07
Grid complexity:     1.06
Memory footprint:    90.48 M

level     unknowns       nonzeros      memory
---------------------------------------------
    0        37442        1184260     85.81 M (93.27%)
    1         2398          85516      4.66 M ( 6.73%)

Iterations: 40
Error:      5.69973e-09

[Profile:       2.941 s] (100.00%)
[ self:         0.196 s] (  6.66%)
[  reading:     0.021 s] (  0.73%)
[  setup:       2.241 s] ( 76.21%)
[  solve:       0.482 s] ( 16.40%)

Of course, the additional iterations have cost, so you have to strike some kind of balance here.