Open haoxiangNtu opened 6 months ago
Hello,
Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver.
I could not improve on the convergence results you show. You could try to use grid coordinates to generate near-nullspace vectors (see https://amgcl.readthedocs.io/en/latest/tutorial/Nullspace.html).
How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).
The CUDA backend does not support block-valued solution, but you could use VexCL backend with CUDA as VexCL backend. There is an example of using VexCL backend with block-valued systems here: https://amgcl.readthedocs.io/en/latest/tutorial/Serena.html
If i could get a good initial solution for result vector x, how to specify it in the code?
You should put it into the solution vector. Vector x is treated as both input (for initial approximation) and output (for the result).
Hello Sir,
I have been studying and trying many options in AMGCL for the past few days, but still, there has been no progress. After using the nullspace, I found that the improvement in convergence is not significant. I'm not sure if this is because the C matrix I constructed is incorrect. My stiffness matrix is a general tetrahedral stiffness matrix, and I have removed the rows and columns corresponding to the fixed points. This means that I have also removed the coordinates of the fixed points in the C matrix (this matrix is also uploaded in the previous link), and I am not sure if this approach is correct. Additionally, I tried using ns_search to find approximate nullspace, but the results are still not satisfactory. Here are some simple results from my tests: currently, using the CG solver with the block option gives the best result, but it takes about 1.6 seconds. I have also implemented the VexCL backend with block-valued systems with the CUDA option, but it also takes about 0.62 seconds to compute. However, when I tested with the Intel MKL's Pardiso library, the results only took about 0.608 seconds. Do you have any further suggestions?
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.maxiter=1000 -b3
Type: BiCGStab Unknowns: 16011 Memory footprint: 2.57 M
Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.39 M
0 16011 177785 22.74 M (88.25%)
1 1206 22398 2.31 M (11.12%)
2 86 1272 345.44 K ( 0.63%)
Iterations: 290 Error: 8.76707e-09
[Profile: 3.483 s] (100.00%) [ reading: 1.579 s] ( 45.33%) [ setup: 0.019 s] ( 0.54%) [ solve: 1.885 s] ( 54.11%)
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3
Type: CG Unknowns: 16011 Memory footprint: 1.47 M
Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.39 M
0 16011 177785 22.74 M (88.25%)
1 1206 22398 2.31 M (11.12%)
2 86 1272 345.44 K ( 0.63%)
Iterations: 513 Error: 9.83891e-09
[Profile: 3.301 s] (100.00%) [ reading: 1.581 s] ( 47.89%) [ setup: 0.025 s] ( 0.75%) [ solve: 1.695 s] ( 51.34%)
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx
Iterations: 890 Error: 9.74222e-09
[Profile: 5.619 s] (100.00%) [ reading: 1.612 s] ( 28.69%) [ setup: 0.080 s] ( 1.43%) [ solve: 3.926 s] ( 69.87%)
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -C C_Shoe.mtx
Iterations: 471 Error: 9.75176e-09
[Profile: 5.453 s] (100.00%) [ reading: 1.634 s] ( 29.96%) [ setup: 0.097 s] ( 1.77%) [ solve: 3.722 s] ( 68.26%)
./ns_search -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -n6 -o N6.mtx
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 precond.coarsening.aggr.eps_strong=0 -N N6.mtx
Iterations: 235 Error: 9.75834e-09
[Profile: 3.683 s] (100.00%) [ reading: 1.780 s] ( 48.32%) [ setup: 0.101 s] ( 2.74%) [ solve: 1.802 s] ( 48.93%)
./solver -A A_Shoe.mtx -f f_Shoe.mtx solver.type=cg solver.maxiter=1000 -b3 precond.coarsening.aggr.eps_strong=0 -N N6.mtx Iterations: 459 Error: 9.82758e-09
[Profile: 3.762 s] (100.00%) [ reading: 1.720 s] ( 45.73%) [ setup: 0.076 s] ( 2.02%) [ solve: 1.964 s] ( 52.21%)
VexCL backend with block-valued systems with the CUDA option:
Type: CG Unknowns: 16011 Memory footprint: 1.47 M
Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 13.96 M
0 16011 177785 12.51 M (88.25%)
1 1206 22398 1.27 M (11.12%)
2 86 1272 175.07 K ( 0.63%)
Iters: 461 Error: 9.98164e-08
[Serena (VexCL): 2.598 s] (100.00%) [ GPU matrix: 0.013 s] ( 0.48%) [ read: 1.916 s] ( 73.75%) [ setup: 0.047 s] ( 1.82%) [ solve: 0.622 s] ( 23.93%)
It looks like the problem is just too difficult to solve with a simple iterative approach. Intel Pardiso is a direct solver, so it does not have this problem. Direct solvers usually do not scale as good as iterative ones, but if you do not need to solve larger systems, and are satisfied with the Pardiso performance, then may be the direct solver is your best approach here.
Otherwise, you probably need to research on the methods to solve your particular problem. I am sorry I could not be of more help.
Hello Sir, I want to express my heartfelt gratitude for your outstanding work in creating AMGCL!
I have encountered an issue related to the convergence of the linear system matrix A_hand, which is causing slow convergence. A_hand, along with vector f_hand, is computed from a 3D tetrahedral mesh used for Finite Element Method (FEM) object simulation. A_hand is a matrix of approximately 176328 x 176328 dimensions. I have also followed the recommendations mentioned in issue #175. Specifically, I have used the provided main_2.cpp, which employs a block-structure with 3x3 blocks and sets precond.coarsening.aggr.block_size=3. I have also experimented with other meshes containing A_shoe, which has only 48033 unknowns. However, it still requires an excessive number of iterations to achieve a relatively better result, as indicated below.
I have three questions here:
Is there a more effective approach to expedite the convergence of the linear system? I am uncertain why the convergence for such a matrix is slow and how to fine-tune the AMG solver. To provide more insight, I have included a matrix portrait below.
How can I implement the block-structure in the CUDA-compatible AMG solver? The file with a block-structure I previously used was designed for Eigen, and I am unsure how to adapt it to the CUDA version (block structure for CUDA?).
If i could get a good initial solution for result vector x, how to specify it in the code? (Since here im doing an project that need to design material distribution in an object (update material in each iteration). I think the solution computed from last iteration would help for converge in the next iteration.)
Attached, you will find the main.cpp and the results. The matrix can be downloaded from the provided link.
I would be immensely grateful for any suggestions you may offer regarding optimal parameters and the combination of options to expedite the solution.
Thank you very much for your time, Sir.
result of A_hand Matrix A has 176328 rows and 176328 columns. Matrix A is rowmajor Solver
Type: BiCGStab Unknowns: 58776 Memory footprint: 9.42 M
Preconditioner
Number of levels: 3 Operator complexity: 1.35 Grid complexity: 1.08 Memory footprint: 113.82 M
level unknowns nonzeros memory
500 4.39234e-06
[Profile: 20.294 s] (100.00%) [ self: 7.117 s] ( 35.07%) [ setup: 0.149 s] ( 0.73%) [ solve: 13.028 s] ( 64.20%)
result of A_Shoe Matrix A has 48033 rows and 48033 columns. Matrix A is rowmajor Solver
Type: BiCGStab Unknowns: 16011 Memory footprint: 2.57 M
Preconditioner
Number of levels: 3 Operator complexity: 1.13 Grid complexity: 1.08 Memory footprint: 25.40 M
level unknowns nonzeros memory
399 7.81433e-09
[Profile: 3.833 s] (100.00%) [ self: 1.882 s] ( 49.10%) [ setup: 0.022 s] ( 0.58%) [ solve: 1.929 s] ( 50.32%)
Matrix portrait of A_hand
![HandSparseValue](https://github.com/ddemidov/amgcl/assets/110380534/68109815-19ee-4d1b-8f1b-d7c0682f3304)
Matrix portrait of A_shoe
![ShoeSparseValue](https://github.com/ddemidov/amgcl/assets/110380534/9941d313-fae8-43c2-8296-2888cd758bb3)