Closed davidherreroperez closed 4 years ago
There is a 'dummy' preconditioner (apply is a no-op) designed specifically for this: https://github.com/ddemidov/amgcl/blob/master/amgcl/preconditioner/dummy.hpp.
You can use it in your code directly, or with a 'runtime' preconditioner, or just use examples/solver.cpp. The following examples solve a 3D Poisson problem on a 80^3 grid, but you could also provide a system matrix in the matrix market format:
$ ./examples/solver -n 80 -p solver.type=cg precond.class=amg
Solver
======
Type: CG
Unknowns: 512000
Memory footprint: 15.62 M
Preconditioner
==============
Number of levels: 4
Operator complexity: 1.62
Grid complexity: 1.13
Memory footprint: 180.50 M
level unknowns nonzeros memory
---------------------------------------------
0 512000 3545600 134.56 M (61.85%)
1 64560 1903196 40.72 M (33.20%)
2 4185 270731 4.89 M ( 4.72%)
3 237 13051 336.19 K ( 0.23%)
Iterations: 17
Error: 5.50212e-09
[Profile: 1.380 s] (100.00%)
[ self: 0.004 s] ( 0.27%)
[ assembling: 0.053 s] ( 3.85%)
[ setup: 0.421 s] ( 30.48%)
[ solve: 0.903 s] ( 65.39%)
$ ./examples/solver -n 80 -p solver.type=cg solver.maxiter=1000 precond.class=dummy
Solver
======
Type: CG
Unknowns: 512000
Memory footprint: 15.62 M
Preconditioner
==============
identity matrix as preconditioner
unknowns: 512000
nonzeros: 3545600
Iterations: 199
Error: 8.76374e-09
[Profile: 2.572 s] (100.00%)
[ self: 0.004 s] ( 0.14%)
[ assembling: 0.059 s] ( 2.29%)
[ setup: 0.020 s] ( 0.78%)
[ solve: 2.490 s] ( 96.79%)
Many thanks for the prompt reply!
I can find in the literature that one of the advantages of using CG iterative method with multigrid preconditioner is that the number of iterations does not increase even when the mesh is finer, for instance
http://www.hpcs.cs.tsukuba.ac.jp/~tatebe/research/paper/CM93-tatebe.pdf
I'm doing many experiments in the context of linear elasticity with finite elements of order 1 modifying/refining the mesh and also solving the same problems using hypre, and these are the results
problem_size iters_1 iters_2 iters_hypre 650 1 20 13 2450 1 24 14 9506 304 35 15 37442 596 44 16 148610 1189 73 16 592130 2368 111 17
Experiment 1: AMGCL using "./solver --matrix A'SIZE'.mtx --rhs b'SIZE'.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.class=amg" Experiment 2: AMGCL using "./solver --matrix A'SIZE'.mtx --rhs b'SIZE'.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.class=amg precond.coarse_enough=25 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=gaussseidel" Experiment 3: HYPRE CG preconditioned with AMG with the configuration indicated in the log files hypre'SIZE'.txt
https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0
Experiment 2 aims to emulate the configuration used by hypre in experiment 3.
I would be very grateful if you can say me if I'm not configuring properly a CG method preconditioned by AMG using AMGCL, or the reasons the number of iterations of CG method increase with the mesh size.
Thanks in advance!
It looks like the problem is either extremely anisotropic or convection-dominated (hence the need to set eps_strong=0.9), and thus is not suited very well for smoothed aggregation coarsening used in the first experiment.
Ruge-Stuben coarsening usually works better with such problems, and indeed, the convergence with your second set of parameters looks good enough to me. You could somewhat improve it by dropping precond.coarse_enough=25
(so that direct solver is applied at the finer level) and replacing gauss_seidel
relaxation with ilu0
:
$ ./solver -B -A A_37442.bin -f b_37442.bin -p solver.type=cg \
precond.coarsening.type=ruge_stuben \
precond.coarsening.eps_strong=0.9 \
precond.relax.type=ilu0
...
Iterations: 20
Error: 4.72897e-09
...
$ ./solver -B -A A_592130.bin -f b_592130.bin -p solver.type=cg \
precond.coarsening.type=ruge_stuben \
precond.coarsening.eps_strong=0.9 \
precond.relax.type=ilu0
...
Iterations: 70
Error: 7.93614e-09
...
What is the kind of problem you are solving here? Knowing this could help to further improve convergence.
I have to say I am impressed with hypre performance here, which seems to be related to the HMIS coarsening they use (unavailable in amgcl).
Thanks for your comments!
What is the kind of problem you are solving here? Knowing this could help to further improve convergence.
I'm solving a plane strain linear elasticity problem. The model is a 2d cantilever with structured mesh and constant material properties. All the displacements of the left side are constrainted and a uniform vertical load is applied to the right side. The resulting displacements are as follows
I have to say I am impressed with hypre performance here, which seems to be related to the HMIS coarsening they use (unavailable in amgcl).
Just to mention that I was testing AMGX with HMIS coarsening and the number of iterations also increase with the mesh refinement, but the problem could be related to other reason ...
I'm solving a plane strain linear elasticity problem.
Does that mean you have two unknowns per grid point? In this case the matrix should have block structure with 2x2 blocks, but it does not look that way. Do you order unknowns by grid points or by direction (x or y)? In the first case, we could either use point-wise aggregation coarsening or use block-valued matrix and rhs.
Also, providing near null space vectors (rigid body modes) usually helps with elasticity problems and aggregation coarsening.
After a second look at the matrix structure it does look like the unknowns are ordered by axis (x or y) rather than by grid point. Is it possible to switch the ordering?
Hi Denis
After a second look at the matrix structure it does look like the unknowns are ordered by axis (x or y) rather than by grid point. Is it possible to switch the ordering?
I have updated the dropbox link
https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0
including two folders: ordered_by_nodes and ordered_by_dim.
ordered_by_nodes: loop first over the nodes (inner loop) then over the vector dimension (outer loop); symbolically it can be represented as: XXX...,YYY...
ordered_by_dim: loop first over the vector dimension (inner loop) then over the nodes (outer loop); symbolically it can be represented as: XY,XY,XY,...
Taking a look at the experiments showed above using Hypre, I have to say that I used the ordered_by_dim format including the dimension of the problem using the HYPRE_BoomerAMGSetNumFunctions(dim) option ...
Using the information about block structure somewhat helps. Here is non-smoothed aggregation with node-wise coarsening:
./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
precond.relax.type=ilu0 precond.coarsening.type=aggregation \
precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 4
Operator complexity: 1.79
Grid complexity: 1.65
Memory footprint: 44.13 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 667012 23.86 M (55.76%)
1 18332 366176 13.03 M (30.61%)
2 4978 132874 4.55 M (11.11%)
3 1078 30092 2.70 M ( 2.52%)
Iterations: 346
Error: 9.31048e-09
[Profile: 3.325 s] (100.00%)
[ reading: 0.449 s] ( 13.52%)
[ setup: 0.085 s] ( 2.56%)
[ solve: 2.790 s] ( 83.91%)
and here is the case when the system matrix is treated as having 2x2 block values:
./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
precond.relax.type=ilu0 precond.coarsening.type=aggregation -b2
Solver
======
Type: CG
Unknowns: 18721
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.08
Memory footprint: 58.89 M
level unknowns nonzeros memory
---------------------------------------------
0 18721 482505 40.12 M (93.22%)
1 1470 35096 18.77 M ( 6.78%)
Iterations: 240
Error: 8.10336e-09
[Profile: 3.211 s] (100.00%)
[ reading: 0.449 s] ( 13.97%)
[ setup: 0.255 s] ( 7.95%)
[ solve: 2.506 s] ( 78.06%)
Still, the Ruge-Stuben coarsening seems to be optimal (among supported by amgcl) for this kind of problem.
Sorry, I've looked at the wrong matrices. The correct set to use with block-wise coarsening/block valued matrix is the ordered_by_dim one (XY,XY):
./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
precond.coarsening.type=aggregation precond.relax.type=ilu0 \
precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 26.26 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 667012 23.82 M (94.06%)
1 2450 42148 2.44 M ( 5.94%)
Iterations: 81
Error: 9.31612e-09
[Profile: 0.814 s] (100.00%)
[ reading: 0.448 s] ( 55.11%)
[ setup: 0.036 s] ( 4.47%)
[ solve: 0.328 s] ( 40.36%)
./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
precond.coarsening.type=aggregation precond.relax.type=ilu0 -b2
Solver
======
Type: CG
Unknowns: 18721
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 18.25 M
level unknowns nonzeros memory
---------------------------------------------
0 18721 166753 15.74 M (94.06%)
1 1225 10537 2.52 M ( 5.94%)
Iterations: 82
Error: 7.50392e-09
[Profile: 0.724 s] (100.00%)
[ reading: 0.492 s] ( 67.88%)
[ setup: 0.021 s] ( 2.85%)
[ solve: 0.211 s] ( 29.19%)
Performance-wise, these are almost compatible with Ruge-Stuben (also, note the difference in reported memory footprint; the RS is much more expensive here):
./solver -A A_37442.mtx -f b_37442.mtx -p solver.{type=cg,maxiter=500} \
precond.coarsening.{type=ruge_stuben,eps_strong=0.9} \
precond.relax.type=ilu0
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 5
Operator complexity: 2.03
Grid complexity: 1.92
Memory footprint: 50.06 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 667012 24.48 M (49.36%)
1 18426 326064 11.99 M (24.13%)
2 9211 197059 7.10 M (14.58%)
3 4506 105972 3.73 M ( 7.84%)
4 2167 55135 2.76 M ( 4.08%)
Iterations: 21
Error: 6.15272e-09
[Profile: 0.674 s] (100.00%)
[ reading: 0.443 s] ( 65.69%)
[ setup: 0.078 s] ( 11.64%)
[ solve: 0.152 s] ( 22.59%)
Final note: there is no need to use non-smoothed aggregation either:
./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 30.21 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 667012 27.77 M (94.03%)
1 2450 42340 2.44 M ( 5.97%)
Iterations: 65
Error: 9.72811e-09
[Profile: 0.886 s] (100.00%)
[ reading: 0.455 s] ( 51.38%)
[ setup: 0.049 s] ( 5.49%)
[ solve: 0.382 s] ( 43.07%)
./solver -A A_37442_dim.mtx -f b_37442_dim.mtx -p solver.{type=cg,maxiter=500} \
precond.relax.type=ilu0 -b2
Solver
======
Type: CG
Unknowns: 18721
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 20.02 M
level unknowns nonzeros memory
---------------------------------------------
0 18721 166753 17.51 M (94.03%)
1 1225 10585 2.52 M ( 5.97%)
Iterations: 77
Error: 7.52487e-09
[Profile: 0.817 s] (100.00%)
[ reading: 0.477 s] ( 58.43%)
[ setup: 0.027 s] ( 3.27%)
[ solve: 0.312 s] ( 38.23%)
Thanks for your comments!
Just one last question. When I use block_size>1 I get the following runtime error
terminate called after throwing an instance of 'std::runtime_error' what(): Unsupported block size
coming from line 354 of https://github.com/ddemidov/amgcl/blob/master/examples/solver.cpp
How can I compile with support for block_size>1?
You should define AMGCL_BLOCK_SIZES
macro to something like (2)(3)(4)
. If you are using cmake to compile amgcl examples, then define cmake variable -DAMGCL_BLOCK_SIZES=2;3;4
to the same effect. The default is 3;4
:
Had some time today to look at the largest matrix from your set. Smoothed aggregation coarsening from amgcl manages to converge in the same time HMIS from Hypre does (keeping in mind we ran the tests on different machines; mine has an Intel Core i7 920 CPU @ 2.67GHz). It makes more iterations, but each iteration is cheaper, since the hiererachy is much lighter memory-wise:
$ ./solver -B -A ./problems/issue-114/A_592130.bin -f ./problems/issue-114/b_592130.bin \
-p solver.type=cg solver.maxiter=300 precond.relax.type=ilu0 -b2
Solver
======
Type: CG
Unknowns: 296065
Memory footprint: 18.07 M
Preconditioner
==============
Number of levels: 3
Operator complexity: 1.07
Grid complexity: 1.07
Memory footprint: 298.57 M
level unknowns nonzeros memory
---------------------------------------------
0 296065 2657665 278.53 M (93.74%)
1 18721 166753 17.52 M ( 5.88%)
2 1225 10585 2.52 M ( 0.37%)
Iterations: 110
Error: 8.61486e-09
[Profile: 10.466 s] (100.00%)
[ self: 0.011 s] ( 0.10%)
[ reading: 0.151 s] ( 1.44%)
[ setup: 0.640 s] ( 6.11%)
[ solve: 9.665 s] ( 92.35%)
Compare this to Hypre numbers from your report (3.95s for setup; 9.33 for solve in 17 iterations). Also, we can use a GPU with amgcl:
$ OCL_DEVICE=K40 ./solver_vexcl_cl -B \
-A ./problems/issue-114/A_592130.bin -f ./problems/issue-114/b_592130.bin \
-p solver.type=cg solver.maxiter=300 precond.relax.type=ilu0 -b2
1. Tesla K40c (NVIDIA CUDA)
Solver
======
Type: CG
Unknowns: 296065
Memory footprint: 18.07 M
Preconditioner
==============
Number of levels: 3
Operator complexity: 1.07
Grid complexity: 1.07
Memory footprint: 303.34 M
level unknowns nonzeros memory
---------------------------------------------
0 296065 2657665 283.00 M (93.74%)
1 18721 166753 17.79 M ( 5.88%)
2 1225 10585 2.56 M ( 0.37%)
Iterations: 115
Error: 8.43753e-09
[Profile: 2.679 s] (100.00%)
[ self: 0.200 s] ( 7.48%)
[ reading: 0.145 s] ( 5.41%)
[ setup: 0.866 s] ( 32.31%)
[ solve: 1.468 s] ( 54.80%)
Hi Denis,
I have observed a significant increment in the number of iterations using mpi_amg as the number of non-zeros per row increases. In order to reproduce the problem, I have increased the degree of the finite elements (p-method) of this model
I'm solving a plane strain linear elasticity problem. The model is a 2d cantilever with structured mesh and constant material properties. All the displacements of the left side are constrainted and a uniform vertical load is applied to the right side.
combined with the refining of the mesh (h-method). The coefficient matrices (A2) and rhs vectors (b2) are located at
https://www.dropbox.com/sh/mf38r0mew9eloli/AACumScoRpVqjo1VDks98wgIa?dl=0
We can compare the results with two problems of 37442 unknows (18432 vs 4608 finite elements) with different sparsity of the coefficient matrix:
./solver -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
======
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 30.21 M
level unknowns nonzeros memory
0 37442 667012 27.77 M (94.03%)
1 2450 42340 2.44 M ( 5.97%)
Iterations: 65
Error: 9.72811e-09
[Profile: 0.922 s] (100.00%)
[ reading: 0.546 s] ( 59.21%)
[ setup: 0.050 s] ( 5.44%)
[ solve: 0.326 s] ( 35.30%)
./solver -A ordered_by_dim/A2_37442.mtx -f ordered_by_dim/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 48.92 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 44.29 M (93.27%)
1 2398 85516 4.63 M ( 6.73%)
Iterations: 4619
Error: 9.98768e-09
[Profile: 38.110 s] (100.00%)
[ reading: 0.945 s] ( 2.48%)
[ setup: 0.122 s] ( 0.32%)
[ solve: 37.042 s] ( 97.20%)
./solver -A ordered_by_dim/A_37442.mtx -f ordered_by_dim/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.relax.type=ilu0 -b2
Solver
======
Type: CG
Unknowns: 18721
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.06
Grid complexity: 1.07
Memory footprint: 20.02 M
level unknowns nonzeros memory
---------------------------------------------
0 18721 166753 17.51 M (94.03%)
1 1225 10585 2.52 M ( 5.97%)
Iterations: 77
Error: 7.52487e-09
[Profile: 0.857 s] (100.00%)
[ reading: 0.558 s] ( 65.16%)
[ setup: 0.036 s] ( 4.18%)
[ solve: 0.262 s] ( 30.61%)
./solver -A ../mtx/ordered_by_dim/A2_37442.mtx -f ../mtx/ordered_by_dim/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=10000 precond.relax.type=ilu0 -b2
Solver
======
Type: CG
Unknowns: 18721
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 33.16 M
level unknowns nonzeros memory
---------------------------------------------
0 18721 296065 27.85 M (93.24%)
1 1199 21469 5.30 M ( 6.76%)
Iterations: 10000
Error: 13.9903
[Profile: 74.239 s] (100.00%)
[ reading: 0.967 s] ( 1.30%)
[ setup: 0.094 s] ( 0.13%)
[ solve: 73.178 s] ( 98.57%)
./solver -A ordered_by_nodes/A_37442.mtx -f ordered_by_nodes/b_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=ilu0
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 5
Operator complexity: 2.03
Grid complexity: 1.92
Memory footprint: 50.06 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 667012 24.48 M (49.36%)
1 18426 326064 11.99 M (24.13%)
2 9211 197059 7.10 M (14.58%)
3 4506 105972 3.73 M ( 7.84%)
4 2167 55135 2.76 M ( 4.08%)
Iterations: 21
Error: 6.15272e-09
[Profile: 0.804 s] (100.00%)
[ reading: 0.552 s] ( 68.68%)
[ setup: 0.086 s] ( 10.67%)
[ solve: 0.166 s] ( 20.60%)
./solver -A ordered_by_nodes/A2_37442.mtx -f ordered_by_nodes/b2_37442.mtx -p solver.type=cg solver.tol=1e-8 solver.maxiter=5000 precond.coarsening.type=ruge_stuben precond.coarsening.eps_strong=0.9 precond.relax.type=ilu0
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 4
Operator complexity: 1.85
Grid complexity: 1.62
Memory footprint: 73.53 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 40.37 M (54.16%)
1 13846 662934 21.70 M (30.32%)
2 6904 285918 9.47 M (13.07%)
3 2305 53659 1.98 M ( 2.45%)
Iterations: 242
Error: 8.07748e-09
[Profile: 4.089 s] (100.00%)
[ reading: 0.944 s] ( 23.09%)
[ setup: 0.170 s] ( 4.15%)
[ solve: 2.974 s] ( 72.75%)
I know that the number of iterations can increase as the sparsity also do it since the condition number is different, but they seem a lot of iterations. It does not converge using block values, I am probably doing something wrong. I solve this problem (A2_37442 and b2_37442) using a cg with Jacobi preconditioning taking 752 iterations. I would appreciate your opinion about this issue.
Looks like ilu0 is behaving badly in this case. spai0 fixes the convergence:
./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=200} \
precond.relax.type=spai0 precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 30.25 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 25.62 M (93.27%)
1 2398 85516 4.63 M ( 6.73%)
Iterations: 104
Error: 8.10064e-09
[Profile: 0.953 s] (100.00%)
[ reading: 0.021 s] ( 2.23%)
[ setup: 0.107 s] ( 11.20%)
[ solve: 0.824 s] ( 86.48%)
Not sure what happens with ilu0, but spai0 is much cheaper to setup and scales better across multiple cores/mpi processes, so if it works, I would just use it.
Another option is ilu(k) (with k=1 by default):
./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=200} \
precond.relax.type=iluk precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 90.50 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 85.87 M (93.27%)
1 2398 85516 4.63 M ( 6.73%)
Iterations: 40
Error: 4.9139e-09
[Profile: 2.943 s] (100.00%)
[ reading: 0.022 s] ( 0.74%)
[ setup: 2.182 s] ( 74.15%)
[ solve: 0.738 s] ( 25.08%)
And just to check that ilu0 is working as intended, I can reproduce bad convergence with iluk, k=0:
./solver -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8,maxiter=5000} \
precond.relax.{type=iluk,k=0} precond.coarsening.aggr.block_size=2
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 48.92 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 44.29 M (93.27%)
1 2398 85516 4.63 M ( 6.73%)
Iterations: 4619
Error: 9.98768e-09
[Profile: 54.090 s] (100.00%)
[ reading: 0.021 s] ( 0.04%)
[ setup: 0.377 s] ( 0.70%)
[ solve: 53.691 s] ( 99.26%)
Not sure what happens with ilu0, but spai0 is much cheaper to setup and scales better across multiple cores/mpi processes, so if it works, I would just use it.
Thank you Dennis for the suggestion. It using spai0 the cg converges and it scales better across multiple mpi processes.
Another option is ilu(k) (with k=1 by default):
Just to mention that iluk (with k=1 by default) is not working properly using solver_vexcl_cl and solver_vexcl_cuda.
Just to mention that iluk (with k=1 by default) is not working properly using solver_vexcl_cl and solver_vexcl_cuda.
ILU* methods in AMGCL are applied approximately when on a GPGPU backend. That is, instead of doing a forward and backward sweep which has very low parallelism, a couple of Jacobi iterations are used to approximately solve the triangular systems. The default number of iterations is 2, but you can increase the number with precond.relax.solve.iters
parameter:
Default number of iterations:
./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
precond.relax.type=iluk precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 90.48 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 85.81 M (93.27%)
1 2398 85516 4.66 M ( 6.73%)
Iterations: 100
Error: 163.396
[Profile: 5.688 s] (100.00%)
[ self: 1.171 s] ( 20.59%)
[ reading: 0.414 s] ( 7.29%)
[ setup: 2.785 s] ( 48.97%)
[ solve: 1.317 s] ( 23.16%)
Increased number of iterations
./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
precond.relax.{type=iluk,solve.iters=5} precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 90.48 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 85.81 M (93.27%)
1 2398 85516 4.66 M ( 6.73%)
Iterations: 51
Error: 9.58117e-09
[Profile: 2.854 s] (100.00%)
[ self: 0.195 s] ( 6.85%)
[ reading: 0.018 s] ( 0.62%)
[ setup: 2.155 s] ( 75.53%)
[ solve: 0.485 s] ( 17.00%)
even more:
./solver_vexcl_cl -B -A A2_37442.bin -f b2_37442.bin -p solver.{type=cg,tol=1e-8} \
precond.relax.{type=iluk,solve.iters=7} precond.coarsening.aggr.block_size=2
1. Tesla K40c (NVIDIA CUDA)
Solver
======
Type: CG
Unknowns: 37442
Memory footprint: 1.14 M
Preconditioner
==============
Number of levels: 2
Operator complexity: 1.07
Grid complexity: 1.06
Memory footprint: 90.48 M
level unknowns nonzeros memory
---------------------------------------------
0 37442 1184260 85.81 M (93.27%)
1 2398 85516 4.66 M ( 6.73%)
Iterations: 40
Error: 5.69973e-09
[Profile: 2.941 s] (100.00%)
[ self: 0.196 s] ( 6.66%)
[ reading: 0.021 s] ( 0.73%)
[ setup: 2.241 s] ( 76.21%)
[ solve: 0.482 s] ( 16.40%)
Of course, the additional iterations have cost, so you have to strike some kind of balance here.
Hello,
I'm trying to implement a CG preconditioned by AMG (CG-AMG) as shown in Figures 17 and 18 of the following paper
https://asc.ziti.uni-heidelberg.de/sites/default/files/research/papers/public/NaArCa_15AmgX.pdf
in order to reduce the number of iterations taken to converge the CG solver. I would be very grateful if you can indicate me if the CG implementation can be used without applying any preconditioning in each iteration of the solver ( P.apply(r, s); in line 160 of amgcl/solver/cg.hpp ). I would like to use the AMGCL implementation to take advantage of parallelization functionalities.
Thanks in advance!