NVIDIA / AMGX

Distributed multigrid linear solver library on GPU
468 stars 136 forks source link

Is there a way to reduce solve setup time? #282

Closed leemunseon closed 1 week ago

leemunseon commented 7 months ago

Ax=b Calculate once, same as solve value in CPU when "max_level =1"

But slower than CPU

Increasing "max_level" reduces setup time but increases iter.

Is there a way to reduce solve setup time?

Or can you save time by removing other unnecessary settings?

NVIDIA A10 , AMGX version 2.4.0 Built on Nov 13 2023, 18:10:19 Compiled with CUDA Runtime 11.4, using CUDA driver 11.4 The AMGX_initialize_plugins API call is deprecated and can be safely removed.

FGMRES_AGGREGATION.json

Config max_level = 1

AMG Grid: Number of Levels: 1 LVL ROWS NNZ PARTS SPRSTY Mem (GB)

       0(D)         8423             36031      1  0.000508       0.000528
     ----------------------------------------------------------------------
     Grid Complexity: 1
     Operator Complexity: 1
     Total Memory Usage: 0.000528205 GB
     ----------------------------------------------------------------------
       iter      Mem Usage (GB)       residual           rate
     ----------------------------------------------------------------------
        Ini             2.18726   9.093929e+02
          0             2.18726   2.781798e-11         0.0000
     ----------------------------------------------------------------------
     Total Iterations: 1
     Avg Convergence Rate:               0.0000
     Final Residual:           2.781798e-11
     Total Reduction in Residual:      3.058962e-14
     Maximum Memory Usage:                2.187 GB
     ----------------------------------------------------------------------

Total Time: 1.10915 setup: 1.10528 s solve: 0.00387482 s solve(per iteration): 0.00387482 s

Config max_level = 100 이면

AMG Grid: Number of Levels: 4 LVL ROWS NNZ PARTS SPRSTY Mem (GB)

       0(D)         8423             36031      1  0.000508       0.000611
       1(D)         1371              9123      1   0.00485       0.000264
       2(D)          592              4060      1    0.0116       0.000117
       3(D)          263              1715      1    0.0248       4.62e-05
     ----------------------------------------------------------------------
     Grid Complexity: 1.26428
     Operator Complexity: 1.41348
     Total Memory Usage: 0.00103818 GB

     ----------------------------------------------------------------------
     Total Iterations: 788
     Avg Convergence Rate:               0.9654
     Final Residual:           8.014293e-10
     Total Reduction in Residual:      8.812795e-13
     Maximum Memory Usage:                1.023 GB
     ----------------------------------------------------------------------

Total Time: 3.97606 setup: 0.00537088 s solve: 3.97069 s solve(per iteration): 0.00503894 s

    "use_scalar_norm": 1, 
    "print_solve_stats": 1, 
    "solver": "FGMRES", 
    "obtain_timings": 1, 
    "max_iters": 1000, 
    "monitor_residual": 1, 
    "gmres_n_restart": 400, 
    "convergence": "RELATIVE_INI_CORE", 
    "scope": "main", 
    "tolerance": 1e-08, 
    "norm": "L1"

"max_iters": 1000, , "gmres_n_restart": 400, This must be set so that the value is the same as when max_level = 1 Is there an effective way to set it up?

output_vectorX.txt Matrix.mtx.txt

marsaev commented 7 months ago

I actually can't reproduce results with your matrix and configs, it seems that matrix is pretty ill-conditioned too - can you confirm it?

Regardless, since you are running on A10 - I would recommend (if you didn't already) to switch to fp32 precision, as fp64 is almost non-existent there. You can add -mode dFFI to AMGX examples to enable it.

Looking at your output for

FGMRES_AGGREGATION.json
Config max_level = 1

do you want to construct just 1 additional level of multigrid, or skip it altogether (and do something like GMRES, and Jacobi as preconditioner)? Setup time around one second is a sign that something is wrong since multigrid should almost not be involved.

In your max_levels=4 i see that setup time is actually less than with max_level=1, which another sign that something's fishy :) Can you share both full configs of what you have used?

For such small matrix (8k entries) other methods might be more effective too

marsaev commented 1 week ago

Let us know if you have any more questions.