Open stgeke opened 3 years ago
Looking at
https://github.com/NVIDIA/AMGX/blob/main/base/include/amgx_c.h#L570
It seems like nglobal is int?
On 23 May 2021, at 13:36, marsaev @.***> wrote:
Code instantiates only int as a index type for a rank partition or a single gpu matrix. However global matrix (spanned across multiple ranks) is indexed with int64 global indices. col_indices_global in AMGX_matrix_upload_all_global() API is assumed to be of int64_t type.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Good catch. I actually surprised that it's an int
, i'm pretty sure we specifically had a case with >2B rows some time ago.
We plan for a few API changes this summer, will test and incorporate 64bit number of rows.
Any updates concerning this?
Any updates concerning this?
Bump
I'm working on some changes to the way that types are selected in AmgX as part of a larger piece of work to reduce some of the barriers to development of AmgX, while generally improving the user experience. The lack of support for 64-bit integer row counts is something I have also encountered in projects I have accelerated recently with AmgX, and see as a relatively high priority.
As such, I can confirm this is something I will address this year unless someone else gets there first.
Hello,
Just to note: to check the ability of AmgX to solve systems with more than 2B rows, I ran the amgx_mpi_poisson7 test. It runs with (2 048 000 000 DOF) on 64 GPUs (A100 with 80GB memory):
srun -n 64 amgx_mpi_poisson7 -mode dDDI -p 200 400 400 4 4 4 -c config
AMG Grid:
Number of Levels: 7
LVL ROWS NNZ PARTS SPRSTY Mem (GB)
----------------------------------------------------------------------
0(D) 1024000000 7160320000 32 6.83e-09 109
1(D) 321638311 18106362477 32 1.75e-07 436
2(D) 35075078 6134541300 32 4.99e-06 158
3(D) 2230884 820160005 32 0.000165 27.2
4(D) 129489 74410282 32 0.00444 4.77
5(D) 6731 4243789 32 0.0937 0.874
6(D) 441 122160 32 0.628 0.0717
----------------------------------------------------------------------
Grid Complexity: 1.35066
Operator Complexity: 4.51099
Total Memory Usage: 736.832 GB
----------------------------------------------------------------------
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 51.1258 3.200000e+04
0 51.1258 6.606039e+05 20.6439
1 51.1258 1.738751e+05 0.2632
2 51.1258 9.409743e+04 0.5412
3 51.1258 2.634899e+04 0.2800
4 51.1258 9.909617e+03 0.3761
5 51.1258 3.547899e+03 0.3580
6 51.1258 1.150241e+03 0.3242
7 51.1258 3.814637e+02 0.3316
8 51.1258 1.296999e+02 0.3400
9 51.1258 4.284031e+01 0.3303
10 51.1258 1.477877e+01 0.3450
11 51.1258 4.634529e+00 0.3136
12 51.1258 1.435099e+00 0.3097
13 51.1258 4.461860e-01 0.3109
14 51.1258 1.468895e-01 0.3292
15 51.1258 4.984580e-02 0.3393
16 51.1258 1.533196e-02 0.3076
17 51.1258 5.271090e-03 0.3438
18 51.1258 1.782666e-03 0.3382
----------------------------------------------------------------------
Total Iterations: 19
Avg Convergence Rate: 0.4152
Final Residual: 1.782666e-03
Total Reduction in Residual: 5.570831e-08
Maximum Memory Usage: 51.126 GB
----------------------------------------------------------------------
Total Time: 20.3998
setup: 19.4673 s
solve: 0.932517 s
solve(per iteration): 0.0490799 s
... But crashed when doubling the mesh size (4 096 000 000 DOF):
srun -n 64 amgx_mpi_poisson7 -mode dDDI -p 400 400 400 4 4 4 -c config
AMGX ERROR: file /ccc/scratch/cont002/den/ledacp/trust/amgx_openmp_int64/ThirdPart/src/LIBAMGX/AmgX/src/amgx_c.cu line 2755
AMGX ERROR: Thrust failure.
With config:
config_version=2
solver(pcgf)=PCG
determinism_flag=1
pcgf:preconditioner(prec)=AMG
pcgf:use_scalar_norm=1
pcgf:max_iters=10000
pcgf:convergence=RELATIVE_INI_CORE
pcgf:tolerance=1e-7
pcgf:norm=L2
pcgf:print_solve_stats=1
pcgf:monitor_residual=1
pcgf:obtain_timings=1
prec:error_scaling=0
prec:print_grid_stats=1
prec:max_iters=1
prec:cycle=V
prec:min_coarse_rows=2
prec:max_levels=100
prec:smoother(smoother)=BLOCK_JACOBI
prec:presweeps=1
prec:postsweeps=1
prec:coarsest_sweeps=1
prec:coarse_solver(c_solver)=DENSE_LU_SOLVER
prec:dense_lu_num_rows=2
prec:algorithm=CLASSICAL
#prec:selector=HMIS
# Much faster for setup:
prec:selector=PMIS
prec:interpolator=D2
prec:strength=AHAT
smoother:relaxation_factor=0.8
I am using last AmgX version, so I guess no progress there unhappily.
Thanks
Oups, it works with:
srun -n 80 amgx_mpi_poisson7 -mode dDDI -p 300 400 400 5 4 4 -c config
Number of Levels: 6
LVL ROWS NNZ PARTS SPRSTY Mem (GB)
----------------------------------------------------------------------
0(D) 2560000000 17908480000 80 2.73e-09 274
1(D) 793456455 44794412303 80 7.12e-08 1.09e+03
2(D) 87410454 15526788091 80 2.03e-06 413
3(D) 5507949 2071390723 80 6.83e-05 74.4
4(D) 315703 193178409 80 0.00194 14.5
5(D) 15796 12171321 80 0.0488 3.82
----------------------------------------------------------------------
Grid Complexity: 1.34637
Operator Complexity: 4.49544
Total Memory Usage: 1871.14 GB
----------------------------------------------------------------------
iter Mem Usage (GB) residual rate
----------------------------------------------------------------------
Ini 51.1425 5.059644e+04
0 51.1425 1.742094e+06 34.4312
1 51.1425 1.468481e+06 0.8429
2 51.1425 5.676792e+05 0.3866
3 51.1425 2.467121e+05 0.4346
4 51.1425 6.569428e+04 0.2663
5 51.1425 1.236653e+04 0.1882
6 51.1425 3.125412e+03 0.2527
7 51.1425 2.090049e+03 0.6687
8 51.1425 7.675394e+02 0.3672
9 51.1425 2.230874e+02 0.2907
10 51.1425 5.848744e+01 0.2622
11 51.1425 1.706309e+01 0.2917
12 51.1425 6.254182e+00 0.3665
13 51.1425 1.529237e+00 0.2445
14 51.1425 3.225381e-01 0.2109
15 51.1425 1.107974e-01 0.3435
16 51.1425 4.094373e-02 0.3695
17 51.1425 1.333030e-02 0.3256
18 51.1425 6.989986e-03 0.5244
19 51.1425 2.073903e-03 0.2967
----------------------------------------------------------------------
Total Iterations: 20
Avg Convergence Rate: 0.4272
Final Residual: 2.073903e-03
Total Reduction in Residual: 4.098912e-08
Maximum Memory Usage: 51.143 GB
----------------------------------------------------------------------
Total Time: 25.8384
setup: 24.7648 s
solve: 1.0736 s
solve(per iteration): 0.0536801 s
So, 80x300x400x400=2.56B unknowns which is greater than 2^31.
So, good new and now I naively ask about the upper limit of AmgX ? Thanks to everyone we could answer.
upper limit of AmgX ? There are two types of limitations - software (API parameters ranges) and hardware (memory capacity to fit matrix). For the first it's limited by the parameters types - if something is not enough for your case, let us know. Was your initial issue partially/fully fixes by change mentioned in this issue? For the second it's hard to provide good estimations on peak memory usage for multigrid - it's very case and configuration dependent. Trial and error will give you better picture here.
Ok, thanks marsaev for your answer. I understand that one of the limitation is memory capacity of each device (which probably caused the crash for the 400x400x400=64e6 cells mesh). I did this test, cause I fear according to @mattmartineau comment that AmgX could have some issues with number of rows bigger than 2B. It seems not to be the case. I will confirm with my code which use AmgXWrapper + AmgX, and I needed to fix some 64-bit integer issues in AmgXWrapper for now.
I'm working on some changes to the way that types are selected in AmgX as part of a larger piece of work to reduce some of the barriers to development of AmgX, while generally improving the user experience. The lack of support for 64-bit integer row counts is something I have also encountered in projects I have accelerated recently with AmgX, and see as a relatively high priority.
As such, I can confirm this is something I will address this year unless someone else gets there first.
Code instantiates only
int
as a index type for a rank partition or a single gpu matrix. However global matrix (spanned across multiple ranks) is indexed with int64 global indices.col_indices_global
inAMGX_matrix_upload_all_global()
API is assumed to be ofint64_t
type.