ddemidov / amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method
http://amgcl.readthedocs.org/
MIT License
722 stars 110 forks source link

I do not know how to tune the amgcl to make solver converged #272

Open haoxiangNtu opened 6 months ago

haoxiangNtu commented 6 months ago

Hello Sir, I want to express my heartfelt gratitude for your outstanding work in creating AMGCL!

I have encountered an issue related to the convergence of the linear system matrix A_hand, which is causing slow convergence. A_hand, along with vector f_hand, is computed from a 3D tetrahedral mesh used for Finite Element Method (FEM) object simulation. A_hand is a matrix of approximately 176328 x 176328 dimensions. I have also followed the recommendations mentioned in issue #175. Specifically, I have used the provided main_2.cpp, which employs a block-structure with 3x3 blocks and sets precond.coarsening.aggr.block_size=3. I have also experimented with other meshes containing A_shoe, which has only 48033 unknowns. However, it still requires an excessive number of iterations to achieve a relatively better result, as indicated below.

I have three questions here:

  1. Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver. To provide more insight, I have included a matrix portrait below.

  2. How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).

  3. If i could get a good initial solution for result vector x, how to specify it in the code? (Since here im doing an project that need to design material distribution in an object (update material in each iteration). I think the solution computed from last iteration would help for converge in the next iteration.)

Attached, you will find the main.cpp and the results. The matrix can be downloaded from the provided link.

I would be immensely grateful for any suggestions you may offer regarding optimal parameters and the combination of options to expedite the solution.

Thank you very much for your time, Sir.

result of A_hand Matrix A has 176328 rows and 176328 columns. Matrix A is rowmajor Solver

Type: BiCGStab Unknowns: 58776 Memory footprint: 9.42 M

Preconditioner

Number of levels: 3 Operator complexity: 1.35 Grid complexity: 1.08 Memory footprint: 113.82 M

level unknowns nonzeros memory

0        58776         676276     91.06 M (74.17%)
1         4537         219611     19.71 M (24.08%)
2          299          15945      3.05 M ( 1.75%)

500 4.39234e-06

[Profile: 20.294 s] (100.00%) [ self: 7.117 s] ( 35.07%) [ setup: 0.149 s] ( 0.73%) [ solve: 13.028 s] ( 64.20%)

result of A_Shoe Matrix A has 48033 rows and 48033 columns. Matrix A is rowmajor Solver

Type: BiCGStab Unknowns: 16011 Memory footprint: 2.57 M

Preconditioner

Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.40 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1282    348.96 K ( 0.64%)

399 7.81433e-09

[Profile: 3.833 s] (100.00%) [ self: 1.882 s] ( 49.10%) [ setup: 0.022 s] ( 0.58%) [ solve: 1.929 s] ( 50.32%)

Matrix portrait of A_hand HandSparse HandSparseValue

Matrix portrait of A_shoe ShoeSparse ShoeSparseValue

ddemidov commented 6 months ago

Hello,

Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver.

I could not improve on the convergence results you show. You could try to use grid coordinates to generate near-nullspace vectors (see https://amgcl.readthedocs.io/en/latest/tutorial/Nullspace.html).

How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).

The CUDA backend does not support block-valued solution, but you could use VexCL backend with CUDA as VexCL backend. There is an example of using VexCL backend with block-valued systems here: https://amgcl.readthedocs.io/en/latest/tutorial/Serena.html

If i could get a good initial solution for result vector x, how to specify it in the code?

You should put it into the solution vector. Vector x is treated as both input (for initial approximation) and output (for the result).

haoxiangNtu commented 6 months ago

Hello Sir,

I have been studying and trying many options in AMGCL for the past few days, but still, there has been no progress. After using the nullspace, I found that the improvement in convergence is not significant. I'm not sure if this is because the C matrix I constructed is incorrect. My stiffness matrix is a general tetrahedral stiffness matrix, and I have removed the rows and columns corresponding to the fixed points. This means that I have also removed the coordinates of the fixed points in the C matrix (this matrix is also uploaded in the previous link), and I am not sure if this approach is correct. Additionally, I tried using ns_search to find approximate nullspace, but the results are still not satisfactory. Here are some simple results from my tests: currently, using the CG solver with the block option gives the best result, but it takes about 1.6 seconds. I have also implemented the VexCL backend with block-valued systems with the CUDA option, but it also takes about 0.62 seconds to compute. However, when I tested with the Intel MKL's Pardiso library, the results only took about 0.608 seconds. Do you have any further suggestions?

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.maxiter=1000 -b3

Solver

Type: BiCGStab Unknowns: 16011 Memory footprint: 2.57 M

Preconditioner

Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 290 Error: 8.76707e-09

[Profile: 3.483 s] (100.00%) [ reading: 1.579 s] ( 45.33%) [ setup: 0.019 s] ( 0.54%) [ solve: 1.885 s] ( 54.11%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3

Solver

Type: CG Unknowns: 16011 Memory footprint: 1.47 M

Preconditioner

Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.39 M

level unknowns nonzeros memory

0        16011         177785     22.74 M (88.25%)
1         1206          22398      2.31 M (11.12%)
2           86           1272    345.44 K ( 0.63%)

Iterations: 513 Error: 9.83891e-09

[Profile: 3.301 s] (100.00%) [ reading: 1.581 s] ( 47.89%) [ setup: 0.025 s] ( 0.75%) [ solve: 1.695 s] ( 51.34%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 890 Error: 9.74222e-09

[Profile: 5.619 s] (100.00%) [ reading: 1.612 s] ( 28.69%) [ setup: 0.080 s] ( 1.43%) [ solve: 3.926 s] ( 69.87%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx

Iterations: 471 Error: 9.75176e-09

[Profile: 5.453 s] (100.00%) [ reading: 1.634 s] ( 29.96%) [ setup: 0.097 s] ( 1.77%) [ solve: 3.722 s] ( 68.26%)

./ns_search -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -n6 -o N6.mtx

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -N N6.mtx

Iterations: 235 Error: 9.75834e-09

[Profile: 3.683 s] (100.00%) [ reading: 1.780 s] ( 48.32%) [ setup: 0.101 s] ( 2.74%) [ solve: 1.802 s] ( 48.93%)

./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -N N6.mtx Iterations: 459 Error: 9.82758e-09

[Profile: 3.762 s] (100.00%) [ reading: 1.720 s] ( 45.73%) [ setup: 0.076 s] ( 2.02%) [ solve: 1.964 s] ( 52.21%)

VexCL backend with block-valued systems with the CUDA option:

  1. NVIDIA GeForce RTX 2080 Ti

Matrix A_0 has 48033 rows and 48033 columns. Matrix A_0 is rowmajor Matrix A_0 has 48033 Matrix Serena.mtx: 48033x48033 Solver

Type: CG Unknowns: 16011 Memory footprint: 1.47 M

Preconditioner

Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 13.96 M

level unknowns nonzeros memory

0        16011         177785     12.51 M (88.25%)
1         1206          22398      1.27 M (11.12%)
2           86           1272    175.07 K ( 0.63%)

Iters: 461 Error: 9.98164e-08

[Serena (VexCL): 2.598 s] (100.00%) [ GPU matrix: 0.013 s] ( 0.48%) [ read: 1.916 s] ( 73.75%) [ setup: 0.047 s] ( 1.82%) [ solve: 0.622 s] ( 23.93%)

ddemidov commented 6 months ago

It looks like the problem is just too difficult to solve with a simple iterative approach. Intel Pardiso is a direct solver, so it does not have this problem. Direct solvers usually do not scale as good as iterative ones, but if you do not need to solve larger systems, and are satisfied with the Pardiso performance, then may be the direct solver is your best approach here.

Otherwise, you probably need to research on the methods to solve your particular problem. I am sorry I could not be of more help.