barbagroup / AmgXWrapper

AmgXWrapper: An interface between PETSc and the NVIDIA AmgX library
MIT License
42 stars 22 forks source link

Poisson test case with new API (AmgX_CSR) #33

Open pledac opened 3 years ago

pledac commented 3 years ago

Hello @mattmartineau , thanks for your new API,

I am playing with it and running the poisson test case where it is possible to update the coefficients of the matrix and not rebuilding it. But I am surprised that with AMG preconditioner, the setup is done again. Did i miss something ?

Thanks,

./poisson -caseName log -mode AmgX_CSR -cfgFileName ./AmgX_SolverOptions_Classical.info -Nx 10 -Ny 10 -Nz 10


Case Name: log Nx: 10 Ny: 10 Nz: 10 Mode: AmgX_CSR Config File: ./AmgX_SolverOptions_Classical.info Number of Solves: 10 Output PETSc Log File ? false

======================================================================== AMGX version 2.1.0.131-opensource Built on Mar 14 2021, 16:01:18 Compiled with CUDA Runtime 10.2, using CUDA driver 11.0 Cannot read file as JSON object, trying as AMGX config Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: exception_handling=1 ; Using Normal MPI (Hostbuffer) communicator... AMG Grid: **Number of Levels: 4 LVL ROWS NNZ SPRSTY Mem (GB)

0(D) 1000 6400 0.0064 0.000101 1(D) 500 7760 0.031 0.000193 2(D) 85 4245 0.588 9.82e-05 3(D) 10 100 1 2.55e-06 --------------------------------------------------------------** Grid Complexity: 1.595 Operator Complexity: 2.89141 Total Memory Usage: 0.000395462 GB

iter Mem Usage (GB) residual rate

Ini 0.762817 1.418968e+03 0 0.762817 4.049953e+02 0.2854 1 0.7628 2.272882e+01 0.0561 2 0.7628 2.135450e+00 0.0940 3 0.7628 1.660107e-01 0.0777 4 0.7628 1.927763e-02 0.1161 5 0.7628 2.591171e-03 0.1344 6 0.7628 2.747955e-04 0.1061 7 0.7628 2.890709e-05 0.1052

Total Iterations: 8 Avg Convergence Rate: 0.1093 Final Residual: 2.890709e-05 Total Reduction in Residual: 2.037191e-08 Maximum Memory Usage: 0.763 GB

Total Time: 0.0165809 setup: 0.0125757 s solve: 0.00400515 s solve(per iteration): 0.000500644 s **AMG Grid: Number of Levels: 4 LVL ROWS NNZ SPRSTY Mem (GB)

0(D) 1000 6400 0.0064 0.000101 1(D) 500 7760 0.031 0.000193 2(D) 85 4245 0.588 9.82e-05 3(D) 10 100 1 2.55e-06 --------------------------------------------------------------** Grid Complexity: 1.595 Operator Complexity: 2.89141 Total Memory Usage: 0.000395462 GB

iter Mem Usage (GB) residual rate

Ini 0.762817 2.890709e-05 0 0.762817 4.493219e-06 0.1554 1 0.7628 5.017173e-07 0.1117 2 0.7628 5.233455e-08 0.1043 3 0.7628 5.699205e-09 0.1089 4 0.7628 6.339535e-10 0.1112 5 0.7628 7.636473e-11 0.1205 6 0.7628 8.610483e-12 0.1128 7 0.7628 8.893718e-13 0.1033

Total Iterations: 8 Avg Convergence Rate: 0.1151 Final Residual: 8.893718e-13 Total Reduction in Residual: 3.076656e-08 Maximum Memory Usage: 0.763 GB

Total Time: 0.0161896 setup: 0.0123392 s solve: 0.00385043 s solve(per iteration): 0.000481304 s 2-Norm: 0.986988 Max-Norm: 0.0577365 Iterations 9

========================================================================

End of log

========================================================================

mattmartineau commented 3 years ago

In this API we have the "updateA" call which replaces the matrix coefficients and then performs a lighter weight resetup (rather than the full setup). I am working internally to make this resetup even faster so in the future this should represent a substantial improvement in performance in the future.

It is of course possible to construct an API that would allow uploading coefficients without performing the resetup at all, and I have worked on some applications where this can be useful. It tends to be cases where you are willing to sacrifice some accuracy for the sake of avoiding the setup costs for some number of solve steps. Do you have a particular use case where this is applicable?

pledac commented 3 years ago

In this API we have the "updateA" call which replaces the matrix coefficients and then performs a lighter weight resetup (rather than the full setup).

Oh yes, I read it but forgot this... I will check on my largest test case how lighter is the resetup, thanks.

I am working internally to make this resetup even faster so in the future this should represent a substantial improvement in performance in the future.

Great.

It is of course possible to construct an API that would allow uploading coefficients without performing the resetup at all, and I have worked on some applications where this can be useful. It tends to be cases where you are willing to sacrifice some accuracy for the sake of avoiding the setup costs for some number of solve steps. Do you have a particular use case where this is applicable?

PETSc provides such a feature (https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetReusePreconditioner.html), and a particular use case may be Newton's method (should test in my code) where keeping the setup of the first iteration could provide benefits despite the loss of some accuracy, as the Jacobian slightly changes.

Thanks for answering about the new CSR API.

pledac commented 3 years ago

I ran poisson test case on v100 card with on a bigger test case with no gain on re-setup unhappily.

./poisson -caseName log -mode AmgX_CSR -cfgFileName ./AmgX_SolverOptions_Classical.info -Nx 100 -Ny 100 -Nz 100

Case Name: AmgX_CSR_100x100x100 Nx: 100 Ny: 100 Nz: 100 Mode: AmgX_CSR Config File: AmgX_SolverOptions_Classical.info Number of Solves: 10 Output PETSc Log File ? false

======================================================================== AMGX version 2.1.0.131-opensource Built on Apr 11 2021, 16:29:16 Compiled with CUDA Runtime 10.2, using CUDA driver 10.2 Cannot read file as JSON object, trying as AMGX config Cannot read file as JSON object, trying as AMGX config Converting config string to current config version Parsing configuration string: exception_handling=1 ; Using Normal MPI (Hostbuffer) communicator... AMG Grid: Number of Levels: 7 LVL ROWS NNZ SPRSTY Mem (GB)

       0(D)      1000000           6940000  6.94e-06          0.107
       1(D)       500000           9320600  3.73e-05          0.228
       2(D)        83338          10681050   0.00154          0.242
       3(D)        10337           3323825    0.0311         0.0747
       4(D)          782            294852     0.482        0.00662
       5(D)           69              4761         1       0.000109
       6(D)            6                36         1       9.98e-07
     --------------------------------------------------------------
     Grid Complexity: 1.59453
     Operator Complexity: 4.4042
     Total Memory Usage: 0.658982 GB
     --------------------------------------------------------------
       iter      Mem Usage (GB)       residual           rate
     --------------------------------------------------------------
        Ini             2.30548   7.468230e+04
          0             2.30548   2.586563e+04         0.3463
          1              2.3055   5.702930e+03         0.2205
          2              2.3055   4.112957e+02         0.0721
          3              2.3055   3.034172e+01         0.0738
          4              2.3055   3.429337e+00         0.1130
          5              2.3055   4.298335e-01         0.1253
          6              2.3055   3.945168e-02         0.0918
          7              2.3055   4.581627e-03         0.1161
     --------------------------------------------------------------
     Total Iterations: 8
     Avg Convergence Rate:               0.1255
     Final Residual:           4.581627e-03
     Total Reduction in Residual:      6.134823e-08
     Maximum Memory Usage:                2.305 GB
     --------------------------------------------------------------

Total Time: 3.58589 setup: 3.56315 s solve: 0.0227373 s solve(per iteration): 0.00284217 s AMG Grid: Number of Levels: 7 LVL ROWS NNZ SPRSTY Mem (GB)

       0(D)      1000000           6940000  6.94e-06          0.107
       1(D)       500000           9320600  3.73e-05          0.228
       2(D)        83338          10681050   0.00154          0.242
       3(D)        10337           3323825    0.0311         0.0747
       4(D)          782            294852     0.482        0.00662
       5(D)           69              4761         1       0.000109
       6(D)            6                36         1       9.98e-07
     --------------------------------------------------------------
     Grid Complexity: 1.59453
     Operator Complexity: 4.4042
     Total Memory Usage: 0.658982 GB
     --------------------------------------------------------------
       iter      Mem Usage (GB)       residual           rate
     --------------------------------------------------------------
        Ini             2.30548   4.581627e-03
          0             2.30548   8.070969e-04         0.1762
          1              2.3055   7.323379e-05         0.0907
          2              2.3055   8.564718e-06         0.1170
          3              2.3055   8.627607e-07         0.1007
          4              2.3055   9.406064e-08         0.1090
          5              2.3055   1.163520e-08         0.1237
          6              2.3055   1.156623e-09         0.0994
          7              2.3055   1.194955e-10         0.1033
     --------------------------------------------------------------
     Total Iterations: 8
     Avg Convergence Rate:               0.1127
     Final Residual:           1.194955e-10
     Total Reduction in Residual:      2.608145e-08
     Maximum Memory Usage:                2.305 GB
     --------------------------------------------------------------

Total Time: 3.53631 setup: 3.5141 s solve: 0.022203 s solve(per iteration): 0.00277538 s 2-Norm: 0.348553 Max-Norm: 0.00065713 Iterations 9

========================================================================

End of AmgX_CSR_100x100x100

========================================================================