Closed leemunseon closed 1 week ago
I actually can't reproduce results with your matrix and configs, it seems that matrix is pretty ill-conditioned too - can you confirm it?
Regardless, since you are running on A10 - I would recommend (if you didn't already) to switch to fp32 precision, as fp64 is almost non-existent there. You can add -mode dFFI
to AMGX examples to enable it.
Looking at your output for
FGMRES_AGGREGATION.json
Config max_level = 1
do you want to construct just 1 additional level of multigrid, or skip it altogether (and do something like GMRES, and Jacobi as preconditioner)? Setup time around one second is a sign that something is wrong since multigrid should almost not be involved.
In your max_levels=4
i see that setup time is actually less than with max_level=1
, which another sign that something's fishy :) Can you share both full configs of what you have used?
For such small matrix (8k entries) other methods might be more effective too
Let us know if you have any more questions.
Ax=b Calculate once, same as solve value in CPU when "max_level =1"
But slower than CPU
Increasing "max_level" reduces setup time but increases iter.
Is there a way to reduce solve setup time?
Or can you save time by removing other unnecessary settings?
NVIDIA A10 , AMGX version 2.4.0 Built on Nov 13 2023, 18:10:19 Compiled with CUDA Runtime 11.4, using CUDA driver 11.4 The AMGX_initialize_plugins API call is deprecated and can be safely removed.
FGMRES_AGGREGATION.json
Config max_level = 1
AMG Grid: Number of Levels: 1 LVL ROWS NNZ PARTS SPRSTY Mem (GB)
Total Time: 1.10915 setup: 1.10528 s solve: 0.00387482 s solve(per iteration): 0.00387482 s
Config max_level = 100 이면
AMG Grid: Number of Levels: 4 LVL ROWS NNZ PARTS SPRSTY Mem (GB)
Total Time: 3.97606 setup: 0.00537088 s solve: 3.97069 s solve(per iteration): 0.00503894 s
"max_iters": 1000, , "gmres_n_restart": 400, This must be set so that the value is the same as when max_level = 1 Is there an effective way to set it up?
output_vectorX.txt Matrix.mtx.txt