Closed Jaberh closed 4 years ago
Hi @Jaberh , For every config (file or string) it first tries to parse it as JSON and if it fails then it tries it to process as 'plain text' config (see https://github.com/NVIDIA/AMGX/blob/master/doc/AMGX_Reference.pdf). I strongly recommend sticking to the JSON configs as they are simpler to handle.
Converting config string to current config version Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : "config_version": 2
It seems that it cannot parse your config as JSON for some reason and then it cannot parse it as plain text config (because it is not) which results in fail. Now it's weird that Release/Debug differs in such manner. Which host compiler do you use? Do you wish to use config files or rather config strings in your final code?
To make stuff more interesting, if I use AMGX_config_create(&m_config, "config_version=2,algorithm=AGGREGATION,selector=SIZE_2,print_grid_stats=1,max_iters=1000,monitor_residual=1,obtain_timings=1,print_solve_stats=1,print_grid_stats=1"); it refuses to print grid and solver stats but accepts the rest of the inputs.
This happens because AMGX parsed this config as plain text and it has slightly different syntax. Grid and solver stats parameter need to be explicitly specified for a solver. Again, i strongly recommend using JSON configs, but if you wish i can explain to you how to use plain text configs.
Hi Marsaev, my first choice was also using the JSON file, the issue is when I integrate the amgx to the bigger code, using JSON file it does not iterate, I used the config file from your example, stand alone works perfect adding to the bigger code using JSON number of iterations kept at 0. Using the same JSON file that works as stand alone, I think by now I have almost memorized your entire AMGX_Reference.pdf file. I am using gcc/8.2.0 with mpich/3.2.1 for mpi and cuda/10.2.89
You can also provide JSON as a string. Here is example of extension of amgx_capi
example that reads json file and provide it's contents to AMGX_config_create()
function, it works identical to using AMGX_config_create_from_file()
for me:
{
FILE* fjson = fopen(argv[pidz + 1], "r");
const size_t config_max_len = 4096;
char config[config_max_len];
char* read_ptr = config;
size_t len = config_max_len - 1;
while(fgets(read_ptr, len, fjson)) {
len = MAX(config_max_len - strlen(config) - 1, 0);
read_ptr = config + strlen(config);
}
AMGX_SAFE_CALL(AMGX_config_create(&cfg, config));
}
You can also hardcode config, for example here i copy-pasted AGGREGATION_JACOBI.json
config and sent it to AMGX_config create:
{
const char config[] =
"{ "
" \"config_version\": 2, "
" \"determinism_flag\": 1, "
" \"solver\": { "
" \"print_grid_stats\": 1, "
" \"algorithm\": \"AGGREGATION\", "
" \"obtain_timings\": 1, "
" \"solver\": \"AMG\", "
" \"smoother\": \"BLOCK_JACOBI\", "
" \"print_solve_stats\": 1, "
" \"presweeps\": 2, "
" \"selector\": \"SIZE_2\", "
" \"convergence\": \"RELATIVE_MAX_CORE\", "
" \"coarsest_sweeps\": 2, "
" \"max_iters\": 100, "
" \"monitor_residual\": 1, "
" \"min_coarse_rows\": 2, "
" \"relaxation_factor\": 0.75, "
" \"scope\": \"main\", "
" \"max_levels\": 1000, "
" \"postsweeps\": 2, "
" \"tolerance\": 0.1, "
" \"norm\": \"L1\", "
" \"cycle\": \"V\" "
" } "
"} ";
AMGX_SAFE_CALL(AMGX_config_create(&cfg, config));
}
and output is also identical to what's above. For everything here i used same gcc and Release build.
Would any of those options work for you in your app?
Interesting, I copied your hard coded version and still the error is the same as before Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : { "config_version": 2
0(D) 10000 49600 0.000496 0.000848
1(D) 4717 27857 0.00125 0.00083
2(D) 2145 13539 0.00294 0.000397
3(D) 990 6506 0.00664 0.000189
4(D) 456 3002 0.0144 8.73e-05
5(D) 213 1387 0.0306 3.74e-05
--------------------------------------------------------------
Grid Complexity: 1.8521
Operator Complexity: 2.05425
Total Memory Usage: 0.00238962 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 2.199999e+03
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 2.199999e+03
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.00571098 setup: 0.00561062 s solve: 0.000100352 s solve(per iteration): 0 s
Just to check, when you build Release AMGX, can you check that RAPIDJSON_DEFINED
is in the build C flags? (make VERBOSE=1
for any AMGX library file)
0(D) 12831 87115 0.000529 0.00135
1(D) 5900 52052 0.0015 0.00142
2(D) 2770 29606 0.00386 0.000785
3(D) 1314 15616 0.00904 0.000407
4(D) 621 7745 0.0201 0.000201
5(D) 297 3695 0.0419 9.58e-05
6(D) 140 1658 0.0846 4.13e-05
--------------------------------------------------------------
Grid Complexity: 1.86057
Operator Complexity: 2.26697
Total Memory Usage: 0.00430242 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.909485 1.071363e+05
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 1.071363e+05
**Total Reduction in Residual: 1.000000e+00**
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.0183321 setup: 0.0182201 s solve: 0.000112064 s solve(per iteration): 0 s
However if I use,
AMGX_config_create(&m_config,"config_version=2, solver(s1)=FGMRES, s1:preconditioner=BLOCK_JACOBI ,s1:max_iters=100,s1:convergence=RELATIVE_INI_CORE ,s1:norm=L2, s1:tolerance=1e-3 ,s1:monitor_residual=1,s1:gmres_n_restart=20");
It iterates and converges as follows;
CLASSICAL is not supported in AMGX_read_system_maps_one_ring. (this is also interesting as for the FGMRES there are no AMG's to chose from Classical or aggregation)
res( 1 )11.2807
res( 2 )6.69336
res( 3 )4.74958
res( 4 )3.69872
res( 5 )2.9753
res( 6 )2.25479
res( 7 )1.69783
res( 8 )1.29488
res( 9 )0.989278
res( 10 )0.742286
res( 11 )0.579798
res( 12 )0.47657
res( 13 )0.311112
res( 14 )0.195809
Which definitely reduces the residual, again stand alone both work perfectly, the problem occurs when I integrate it to the real code.
I hope this helps with tracking down the issue, I can also talk on zoom or whatever you prefer if you think It might help with the issue. To me it seems that for whatever reason the the iteration kernel is not being launched
did you get a chance to look at the above issue?
Hey Jaberth,
I suspect none of your configs are parsed as expected. It would definitely help to print exact solver components during solver constructions just to confirm solver structure but there is no such functionality at the moment. Can we try identify issue step by step?
Can you try running built-in example amgx_capi
on the 2cubes_sphere.mtx
matrix from https://suitesparse-collection-website.herokuapp.com/MM/Um/2cubes_sphere.tar.gz
? This would be expected output:
$ examples/amgx_capi -m /tmp/2cubes_sphere/2cubes_sphere.mtx -c ../core/configs/AGGREGATION_JACOBI.json
AMGX version 2.1.0.131-opensource
Built on Jun 5 2020, 16:50:40
Compiled with CUDA Runtime 10.2, using CUDA driver 11.0
Warning: No mode specified, using dDDI by default.
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
AMG Grid:
Number of Levels: 10
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 101492 1647264 0.00016 0.0214
1(D) 48283 834307 0.000358 0.0208
2(D) 23049 429889 0.000809 0.0106
3(D) 11038 221914 0.00182 0.00545
4(D) 5313 114457 0.00405 0.00279
5(D) 2557 58789 0.00899 0.00143
6(D) 1242 29810 0.0193 0.000722
7(D) 602 14574 0.0402 0.000353
8(D) 294 6970 0.0806 0.000169
9(D) 143 3261 0.159 7.72e-05
--------------------------------------------------------------
Grid Complexity: 1.91161
Operator Complexity: 2.0405
Total Memory Usage: 0.0638129 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.900269 1.014920e+05
0 0.900269 1.633251e+03 0.0161
--------------------------------------------------------------
Total Iterations: 1
Avg Convergence Rate: 0.0161
Final Residual: 1.633251e+03
Total Reduction in Residual: 1.609241e-02
Maximum Memory Usage: 0.900 GB
--------------------------------------------------------------
Total Time: 0.0139016
setup: 0.0120689 s
solve: 0.00183274 s
solve(per iteration): 0.00183274 s
For the same config and input data, but for two ranks it should be:
$ mpirun -n 2 examples/amgx_mpi_capi -m /tmp/2cubes_sphere/2cubes_sphere.mtx -c ../core/configs/AGGREGATION_JACOBI.json
Process 0 selecting device 0
Process 1 selecting device 1
AMGX version 2.1.0.131-opensource
Built on Jun 19 2020, 20:14:42
Compiled with CUDA Runtime 10.2, using CUDA driver 11.0
Warning: No mode specified, using dDDI by default.
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ;
Warning: No mode specified, using dDDI by default.
Using Normal MPI (Hostbuffer) communicator...
Reading matrix dimensions in file: /tmp/2cubes_sphere/2cubes_sphere.mtx
Reading data...
RHS vector was not found. Using RHS b=[1,…,1]^T
Solution vector was not found. Setting initial solution to x=[0,…,0]^T
Finished reading
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
AMG Grid:
Number of Levels: 9
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 101492 1647264 0.00016 0.022
1(D) 48168 832024 0.000359 0.0213
2(D) 22955 428089 0.000812 0.0109
3(D) 10967 220573 0.00183 0.0056
4(D) 5277 113673 0.00408 0.00288
5(D) 2541 57753 0.00894 0.00147
6(D) 1221 28739 0.0193 0.000734
7(D) 588 13920 0.0403 0.00036
8(D) 285 6605 0.0813 0.000166
--------------------------------------------------------------
Grid Complexity: 1.9065
Operator Complexity: 2.03285
Total Memory Usage: 0.0653328 GB
--------------------------------------------------------------
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.902222 1.014920e+05
0 0.902222 1.637347e+03 0.0161
--------------------------------------------------------------
Total Iterations: 1
Avg Convergence Rate: 0.0161
Final Residual: 1.637347e+03
Total Reduction in Residual: 1.613277e-02
Maximum Memory Usage: 0.902 GB
--------------------------------------------------------------
Total Time: 0.0273681
setup: 0.0240261 s
solve: 0.00334202 s
solve(per iteration): 0.00334202 s
Hi Marsaev Thank you for the detailed example. I will give this a try. However, the stand alone code using numerous examples and configs works fine. I have verified those about a month ago. The issue is when I integrate the interface and AMGX to a real industrial scale code, for whatever reason, in parallel mode, it does not perform iteration. same config works in the unit test no problem. I am trying to schedule a meeting with NVIDIA through our company as I think it is easier to demonstrate what is going on easier in the meeting. I will let you know once I test this later today
Is it possible somehow to load this example matrix into your industrial code to check if same solve can be repeated in your app? I.e. if you had something like this on each rank:
AMGX_config_create_from_file( ... ,"AGGREGATION_JACOBI.json")
AMGX_matrix_upload_distributed(...)
AMGX_vector_upload(...)
AMGX_solver_setup(...)
AMGX_solver_solve(...)
replace it with:
AMGX_config_create_from_file( ... ,"AGGREGATION_JACOBI.json")
AMGX_read_system("2cubes_sphere.mtx")
AMGX_solver_setup(...)
AMGX_solver_solve(...)
AMGX_finalize(...)
exit(0)
and check that each rank solves this system identically and each rank's AMGX's output is similar to output of standalone example. If your distributed setup is homogeneous, there shouldn't be any major differences (IIRC possible difference - parallel reduction result, but it shouldn't affect solve drastically)
If output is somehow different - can you try log every AMG API call just to check order of calls and parameters? Sorry, there is no built-int logging of API calls, but if you wrap AMGX calls with error checking you can do something like:
#define AMGX_SAFE_CALL(rc) \
std::cout << #rc << endl; \
if (AMGX_RC_OK != (rc)) .....
AMGX_SAFE_CALL( AMGX_config_create_from_file(...) );
Hi Marsaev I will work on this today, I have printed out all the return values as per your suggestion from every function, they are all 0's I will try the above matrix asap
I have printed out all the return values as per your suggestion from every function, they are all 0's
It's great that there are no errors, but it would be great to see actual order of the calls with parameters, hence macro expansion with hashtag #rc
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
**Ini 0.909485 9.969297e-02**
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 9.969297e-02
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.0314893 setup: 0.0293952 s solve: 0.00209405 s solve(per iteration): 0 s again "0" iters
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
**Ini 0.909485 9.969297e-02**
0 0.909485 9.867779e-02 0.9898
1 0.9095 3.703747e-02 0.3753
2 0.9095 1.228826e-02 0.3318
3 0.9095 4.846841e-03 0.3944
4 0.9095 2.128492e-03 0.4392
5 0.9095 9.672328e-04 0.4544
6 0.9095 4.243034e-04 0.4387
7 0.9095 1.947642e-04 0.4590
8 0.9095 8.401128e-05 0.4313
9 0.9095 3.381623e-05 0.4025
10 0.9095 1.696738e-05 0.5018
11 0.9095 7.558131e-06 0.4455
12 0.9095 3.316206e-06 0.4388
13 0.9095 1.404204e-06 0.4234
14 0.9095 6.186404e-07 0.4406
15 0.9095 2.787829e-07 0.4506
16 0.9095 1.260950e-07 0.4523
17 0.9095 4.818015e-08 0.3821
18 0.9095 1.409761e-08 0.2926
19 0.9095 8.047044e-09 0.5708
20 0.9095 5.687245e-09 0.7067
21 0.9095 3.133990e-09 0.5511
22 0.9095 1.435876e-09 0.4582
23 0.9095 5.912854e-10 0.4118
24 0.9095 2.313654e-10 0.3913
25 0.9095 9.721457e-11 0.4202
26 0.9095 3.978193e-11 0.4092
27 0.9095 1.710693e-11 0.4300
28 0.9095 7.397181e-12 0.4324
29 0.9095 3.755933e-12 0.5078
30 0.9095 2.206292e-12 0.5874
31 0.9095 1.030078e-12 0.4669
32 0.9095 4.370271e-13 0.4243
33 0.9095 1.771387e-13 0.4053
34 0.9095 7.296272e-14 0.4119
--------------------------------------------------------------
Total Iterations: 35
Avg Convergence Rate: 0.4501
Final Residual: 7.296272e-14
Total Reduction in Residual: 7.318742e-13
Maximum Memory Usage: 0.909 GB
--------------------------------------------------------------
Total Time: 0.159129 setup: 0.0357844 s solve: 0.123344 s solve(per iteration): 0.00352412 s stat of solve 0 AMGX_vector_download(m_solution, dest) err 8.38997e-05 As you see the grid data as well as initial res are identical
Just to check - are there AMGX_initialize()
and AMGX_finalize()
calls in the code?
yes, a singleton class is reposible for initialization and cleanup, as this library will be used by other developers I wanted to prevent multiple calls, I did no put safe_call of initialize and finalize that is why it is not shown there
Alright, those API calls looks good.
I made some progress today. There are double try/catch blocks inside the code and real issue is caught within the code, but error reported to the C API is incorrectly identified.
Do you suspect that this might be an issue related to something lower level that the programming, such as build, cuda installation, drivers, ... ?
@Jaberh sorry, yesterday for some reason i couldn't add comments to this thread. I pushed a fix https://github.com/NVIDIA/AMGX/commit/7b4d431e67e9f86746166a4dae8de6434a78ac5a to the v2.1.x branch. Can you try the update?
Answering your question - no, this is purely AMGX bug.
Hi marsaev, I am git pulling it right now and will let you know Thanks for all your support and the replies
I built this one which has the latest commit. Unfortunately nothing changed, still unit test works fine and integrated version does not iterate. same output as above
commit 7b4d431e67e9f86746166a4dae8de6434a78ac5a Author: Marat Arsaev marsaev@nvidia.com Date: Thu Jun 25 23:01:31 2020 +0300
Disabling deferred tasks
Got it.
Can you check what solver status is returned with AMGX_solver_get_status(solver, &status);
after AMGX_solver_solve?
Also, can you try adding "solver_verbose" = 1 for both solver and smoother in the config, so something like this for AGGREGATION_JACOBI:
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
"print_grid_stats": 1,
"algorithm": "AGGREGATION",
"obtain_timings": 1,
"solver": "AMG",
"smoother": {
"solver" : "BLOCK_JACOBI",
"scope" : "jacobi",
"solver_verbose" : 1
},
"print_solve_stats": 1,
"presweeps": 2,
"selector": "SIZE_2",
"convergence": "RELATIVE_MAX_CORE",
"coarsest_sweeps": 2,
"max_iters": 100,
"monitor_residual": 1,
"min_coarse_rows": 2,
"relaxation_factor": 0.75,
"scope": "main",
"max_levels": 1000,
"postsweeps": 2,
"tolerance": 0.1,
"norm": "L1",
"cycle": "V",
"solver_verbose" : 1
}
}
you should see something like this in the output:
---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main
AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
info for jacobi should be repeated for each amg level.
Hi Marat. sure, I have been checking for solver return value which is 0. and here is the output using your new config file global initilizer called
AMGX version 2.1.0.131-opensource
Built on Jun 25 2020, 15:44:52
Compiled with CUDA Runtime 10.2, using CUDA driver 10.2
m_rank 0
m_nRank 1
m_ndevice 2
m_nHost 1
Warning: using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void*>(&m_amgx_comm), m_max_device_per_host, (const int*)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
sqrt 1
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)
---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main
AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.911438 7.998657e+00
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 7.998657e+00
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.911 GB
--------------------------------------------------------------
Total Time: 0.0246834
setup: 0.0222985 s
solve: 0.00238486 s
solve(per iteration): 0 s
**stat of solve 0**
AMGX_vector_download(m_solution, dest)
err 1.00176
global destructor called
Looks all right. You are saying that it does iterate with the same config with the same matrix in the unit test, but not in the app code, right?
exactly, here is the same config for unit test,
global initializer called.
AMGX version 2.1.0.131-opensource
Built on Jun 25 2020, 15:44:52
Compiled with CUDA Runtime 10.2, using CUDA driver 10.2
m_rank 0
m_nRank 1
m_ndevice 2
m_nHost 1
Warning: using only 1 of 2 available GPUs
AMGX_config_create_from_file(&m_config, param_file)
AMGX_resources_create(&m_resources, m_config, static_cast<void*>(&m_amgx_comm), m_max_device_per_host, (const int*)(&m_device_id))
AMGX_matrix_create(&m_matrix, m_resources, m_mode)
AMGX_vector_create(&m_rhs, m_resources, m_mode)
AMGX_vector_create(&m_solution, m_resources, m_mode)
AMGX_solver_create(&m_solver, m_resources, m_mode, m_config)
sqrt 1
AMGX_matrix_comm_from_maps_one_ring(m_matrix, m_allocated_halo_depth, m_num_nbrs, m_nbrs, m_send_size, m_send_map, m_recv_size, m_recv_map)
AMGX_matrix_upload_all(m_matrix, m_n, m_nnz, m_block_dimx, m_block_dimy, (const int*)row_ptrs, (const int*)col_indices, (const double*)data, (const double*)diag_data)
AMGX_vector_bind(m_rhs, m_matrix)
AMGX_vector_bind(m_solution, m_matrix)
AMGX_vector_upload(m_rhs, m_n_plus_ghost, m_block_size, rhs)
AMGX_vector_upload(m_solution, m_n_plus_ghost, m_block_size, rhs)
AMGX_solver_setup(m_solver, m_matrix)
---------------------------------------------------------------------------
Parameters for solver: AMG with scope name: main
AMG solver settings:
cycle_iters = 2
norm = L1
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 100
scaling = NONE
norm = L1
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 0
obtain_timings = 1
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
---------------------------------------------------------------------------
Parameters for solver: BLOCK_JACOBI with scope name: jacobi
relaxation_factor= 0.9
max_iters = 100
scaling = NONE
norm = L2
convergence = ABSOLUTE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 0
print_grid_stats = 0
print_vis_data = 0
monitor_residual = 0
store_res_history = 0
obtain_timings = 0
---------------------------------------------------------------------------
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.911438 7.998657e+00
0 0.911438 8.579094e+00 1.0726
1 0.9114 8.166217e+00 0.9519
2 0.9114 7.458036e+00 0.9133
3 0.9114 6.721179e+00 0.9012
4 0.9114 6.026819e+00 0.8967
5 0.9114 5.392025e+00 0.8947
6 0.9114 4.818757e+00 0.8937
7 0.9114 4.304031e+00 0.8932
8 0.9114 3.842848e+00 0.8928
9 0.9114 3.430228e+00 0.8926
10 0.9114 3.061298e+00 0.8924
11 0.9114 2.731515e+00 0.8923
12 0.9114 2.436909e+00 0.8921
13 0.9114 2.173792e+00 0.8920
14 0.9114 1.938851e+00 0.8919
15 0.9114 1.729142e+00 0.8918
16 0.9114 1.541975e+00 0.8918
17 0.9114 1.374959e+00 0.8917
18 0.9114 1.225957e+00 0.8916
19 0.9114 1.093047e+00 0.8916
20 0.9114 9.744872e-01 0.8915
21 0.9114 8.687460e-01 0.8915
22 0.9114 7.744420e-01 0.8914
--------------------------------------------------------------
Total Iterations: 23
Avg Convergence Rate: 0.9035
Final Residual: 7.744420e-01
Total Reduction in Residual: 9.682150e-02
Maximum Memory Usage: 0.911 GB
--------------------------------------------------------------
Total Time: 0.0520445
setup: 0.0229698 s
solve: 0.0290748 s
solve(per iteration): 0.00126412 s
stat of solve 0
AMGX_vector_download(m_solution, dest)
err 0.0657659
global destructor called
I don't have any ideas why it might be happening without hands on on the code, considering that standalone execution functions properly.
Is it for both release and debug? If it happens in debug - can you try tracing with gdb through solve process and see where early exit happens? Function of interest would be https://github.com/NVIDIA/AMGX/blob/d0019e5d32e99e7d679b3b773cf16b6f8e7da6f9/base/src/solvers/solver.cu#L589 and, in particular, this solve iteration loop: https://github.com/NVIDIA/AMGX/blob/d0019e5d32e99e7d679b3b773cf16b6f8e7da6f9/base/src/solvers/solver.cu#L798
ok, this is very odd to me as well, I will build a debug version and monitor that function! It might be beneficial to share this with a build specialist at NVIDIA as well, I am building a debug version and digging into this. Spasiba
For debugging purposes you can set coarse solver and smoother to NOSOLVER, so that only AMG solver would go to this code piece. AMG would still iterate in that case, but residual should not decrease - this should be enough to try debug issue where it's not iterating at all. For example:
{
"config_version": 2,
"determinism_flag": 1,
"solver": {
"print_grid_stats": 1,
"algorithm": "AGGREGATION",
"obtain_timings": 1,
"solver": "AMG",
"smoother": {
"solver" : "NOSOLVER",
"scope" : "jacobi"
},
"coarse_solver" :
{
"solver" : "NOSOLVER",
"scope" : "dense"
},
"print_solve_stats": 1,
"presweeps": 2,
"selector": "SIZE_2",
"convergence": "RELATIVE_MAX_CORE",
"coarsest_sweeps": 2,
"max_iters": 5,
"monitor_residual": 1,
"min_coarse_rows": 2,
"relaxation_factor": 0.75,
"scope": "main",
"max_levels": 1000,
"postsweeps": 2,
"tolerance": 0.1,
"norm": "L1",
"cycle": "V"
}
}
I quickly tried this no luck, I will try this in the debugger as well. I was gonna put the break point in the loop you mentioned above,
Number of Levels: 12
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.98e-05
6(D) 97 609 0.0647 1.79e-05
7(D) 45 269 0.133 8.03e-06
8(D) 21 117 0.265 3.58e-06
9(D) 10 50 0.5 1.59e-06
10(D) 5 21 0.84 7.15e-07
11(D) 2 4 1 1.79e-07
--------------------------------------------------------------
Grid Complexity: 1.8767
Operator Complexity: 2.02514
Total Memory Usage: 0.00237192 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.802063 7.998657e+00
--------------------------------------------------------------
Total Iterations: 0
Avg Convergence Rate: 1.000000
Final Residual: 7.998657e+00
Total Reduction in Residual: 1.000000e+00
Maximum Memory Usage: 0.802 GB
--------------------------------------------------------------
Total Time: 0.158886
setup: 0.14185 s
solve: 0.0170352 s
solve(per iteration): 0 s
stat of solve 0
Hi Marat. So I am trying to build this in debug mode, so I can track down the iterations in cuda-gdb but I get the following error
/lib/../lib64/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `__gmon_start__'
tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libamgxsh.so
crtstuff.c:(.text+0x16): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `_ITM_deregisterTMCloneTable'
/tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `register_tm_clones':
crtstuff.c:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .nvFatBinSegment section in libamgxsh.so
crtstuff.c:(.text+0x57): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol `_ITM_registerTMCloneTable'
/tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0x72): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x7d): relocation truncated to fit: R_X86_64_GOTPCREL against symbol `__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib64/libc.so.6
crtstuff.c:(.text+0x8d): relocation truncated to fit: R_X86_64_PC32 against symbol `__dso_handle' defined in .data.rel.local section in /tools/centos/6/gcc/8.2.0/lib/gcc/x86_64-pc-linux-gnu/8.2.0/crtbeginS.o
crtstuff.c:(.text+0x99): additional relocation overflows omitted from the output
libamgxsh.so: PC-relative offset overflow in PLT entry for `_ZN9__gnu_cxx13new_allocatorISt13_Rb_tree_nodeISt4pairIKPN4amgx11CWrapHandleIP25AMGX_vector_handle_structNS3_6VectorINS3_14TemplateConfigIL16AMGX_MemorySpace1EL17AMGX_VecPrecision0EL17AMGX_MatPrecision1EL17AMGX_IndPrecision2EEEEEEESt10shared_ptrISF_EEEE8allocateEmPK
It seems that produced code is too large for the linker to process.
There are number of ways to reduce amount of generated code, but that might be little to adventurous :) I can try to provide you debug build on centos:centos6.9 with devtoolset-8 - would that work? Which cuda/mpi are you compiling against?
openmpi/3.1.3, cuda/10.2.89
here is the comparison of what test unit and real code outputs calling the same method:
AMG solver settings:
cycle_iters = 2
norm = L2
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 2
scaling = NONE
norm = L2
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 1
obtain_timings = 1
---------------------------------------------------------------------------
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.911438 9.969297e-02
max iters ???????????? 2
called converged() line 295 monitor convergence 1
converged(): 1
**done 1**
----------------------------------------------------------------------------------
AMG solver settings:
cycle_iters = 2
norm = L2
presweeps = 2
postsweeps = 2
max_levels = 1000
coarsen_threshold = 1
min_fine_rows = 1
min_coarse_rows = 2
coarse_solver_d: DENSE_LU_SOLVER with scope name default
coarse_solver_h: DENSE_LU_SOLVER with scope name default
max_iters = 2
scaling = NONE
norm = L2
convergence = RELATIVE_MAX_CORE
solver_verbose= 1
use_scalar_norm = 0
print_solve_stats = 1
print_grid_stats = 1
print_vis_data = 0
monitor_residual = 1
store_res_history = 1
obtain_timings = 1
---------------------------------------------------------------------------
AMG Grid:
Number of Levels: 6
LVL ROWS NNZ SPRSTY Mem (GB)
--------------------------------------------------------------
0(D) 10000 49600 0.000496 0.000848
1(D) 4730 25612 0.00114 0.000781
2(D) 2182 13218 0.00278 0.000392
3(D) 1000 6524 0.00652 0.00019
4(D) 464 3058 0.0142 8.88e-05
5(D) 211 1365 0.0307 3.68e-05
--------------------------------------------------------------
Grid Complexity: 1.8587
Operator Complexity: 2.00357
Total Memory Usage: 0.00233688 GB
--------------------------------------------------------------
AMGX_solver_solve_with_0_initial_guess(m_solver, m_rhs, m_solution)
iter Mem Usage (GB) residual rate
--------------------------------------------------------------
Ini 0.911438 9.969297e-02
max iters ???????????? 2
called converged() monitor convergence 1
converged(): 0
**done 0**
I also noticed that this function
template<class TConfig>
bool AbsoluteConvergence<TConfig>::convergence_update_and_check(const PODVec_h &nrm, const PODVec_h &nrm_ini)
{
printf("Check tolerance: %16.16lf norm_size %d\n", this->m_tolerance,nrm.size());
bool res_converged = true;
bool res_converged_rel = true;
for (int i = 0; i < nrm.size(); i++)
{
bool conv = nrm[i] < this->m_tolerance;
res_converged = res_converged && conv;
bool conv_rel = nrm[i] < Epsilon_conv<ValueTypeB>::value() * nrm_ini[i];
res_converged_rel = res_converged_rel && conv_rel;
printf("nrm %lf nrm_ini %lf Epsilon_conv %lf \n", nrm[i], Epsilon_conv<ValueTypeB>::value(),nrm_ini[i]);
}
// printf("res_converged_rel %d \n", res_converged_rel);
if (res_converged_rel)
{
std::stringstream ss;
ss << "Relative residual has reached machine precision" << std::endl;
amgx_output(ss.str().c_str(), static_cast<int>(ss.str().length()));
return true;
}
return res_converged;
}
for both cases reports
Check tolerance: 0.0000000000000000 norm_size 0
Additionally this method
`m_convergence->convergence_update_and_check(m_nrm, m_nrm_ini)`
returns true for the real code and false for unit test with the same inputs
Assuming that initial residual is the same for both cases, converged() should return the same boolean for both cases, for the real code, it considers it as converged although init Res is huge, If I remove the !done from the iterating loop and hence force the given number of iterations, it converges to the desired tolerance. Let me know what you think, thanks
the solver_verbose does not print out the tolerance, so I tracked down why that function returns differently. Apparently, the tolerance while reading is not set as it shows as "tol1e+298" whereas the unit test reads it correctly as tol 1e-10. everything else is read correctly, I think it is a good idea to add tolerace to this to solver_verbose items as it can catch the errors like this pretty easily. Having said that I am not sure why only this parameter is being read wrong. If I hard code the tolerance this->m_tolerance=1.e-10; it converge,
Great progress! Still it would be good to udnerstand where incorrect read comes from - from JSON parser, or it is modified somewhere. I have sent you debug binary if you are willing to track this issue further using gdb
sure, I am happy to help debug the issue as I am passed overdue to get this to work Did you email?
all the doubles are wrong basically tolerance and relaxation, should be related to Type AMG_Config::getParameter(const string &name, const string ¤t_scope) returns doubles with extremely big exponents e307 .... JSON pareser parses correctly as the issue is still there when I hard code the config parameters, here are the output from the import json object, the read is ok name relaxation_factor value 7.5e+307 Parsing parameter with name "max_levels" of type Number Parsing as int Parsing parameter with name "postsweeps" of type Number Parsing as int Parsing parameter with name "tolerance" of type Number Parsing as double name tolerance value 1e+298
double GetDouble() const {
RAPIDJSON_ASSERT(IsNumber());
if ((flags_ & kDoubleFlag) != 0) return data_.n.d; // exact type, no conversion.
if ((flags_ & kIntFlag) != 0) return data_.n.i.i; // int -> double
if ((flags_ & kUintFlag) != 0) return data_.n.u.u; // unsigned -> double
if ((flags_ & kInt64Flag) != 0) return (double)data_.n.i64; // int64_t -> double (may lose precision)
RAPIDJSON_ASSERT((flags_ & kUint64Flag) != 0); return (double)data_.n.u64; // uint64_t -> double (may lose precision)
}
Just to confirm that parsing is ok, I printed out param that is being processed in
MGX_ERROR AMG_Config::parse_json_file(const char *filename) for both unit test and real code are identical
AMGX_config_create_from_file(&m_config, param_file)
params {
"config_version": 2,
"determinism_flag": 1,
"solver": {
"print_grid_stats": 1,
"algorithm": "AGGREGATION",
"obtain_timings": 1,
"solver": "AMG",
"smoother": "BLOCK_JACOBI",
"print_solve_stats": 1,
"presweeps": 2,
"selector": "SIZE_2",
"convergence": "RELATIVE_MAX_CORE",
"coarsest_sweeps": 2,
"max_iters": 2,
"monitor_residual": 1,
"min_coarse_rows": 2,
"relaxation_factor": 0.75,
"scope": "main",
"max_levels": 1000,
"postsweeps": 2,
"tolerance":1e-10,
"norm": "L2",
"use_scalar_norm": 1,
"cycle": "V",
"store_res_history": 1,
"solver_verbose" : 1
}
}
Hi Marat,
I have been further debugging this, and I noticed that the c_value being passed here is wrong,
void AMG_Config::setNamedParameter(const string &name, const double &c_value, const std::string ¤t_scope, const std::string &new_scope, ParamDesc::iterator ¶m_desc_iter),
the params in parse_json_file is correct, the problem is either related to json_parser.Parse<0>(params.c_str()) or import_json_object(json_parser, true);
Sorry, was away for a holiday.
I got your email from github commit logs, but i think it is wrong sine i've got not delivered notification. You can get binary here: https://drive.google.com/file/d/1qB1Q5SpqtsG54JVJrOst6S3lmFw56Lcd/view?usp=sharing
So, the value in rapidjson::Value
in import_json_object()
is correct, but the actual value c_value
that is passed to the setNamedParameter<double>()
is wrong?
Here is the issue with RAPID JASON
here is the buggy function in RAPID JASON, I guess we have to debug for third party libs as well inline double Pow10(int n) { static const double e[] = { // 1e-308...1e308: 617 * 8 bytes = 4936 bytes 1e-308,1e-307,1e-306,1e-305,1e-304,1e-303,1e-302,1e-301,1e-300, 1e-299,1e-298,1e-297,1e-296,1e-295,1e-294,1e-293,1e-292,1e-291,1e-290,1e-289,1e-288,1e-287,1e-286,1e-285,1e-284,1e-283,1e-282,1e-281,1e-280, 1e-279,1e-278,1e-277,1e-276,1e-275,1e-274,1e-273,1e-272,1e-271,1e-270,1e-269,1e-268,1e-267,1e-266,1e-265,1e-264,1e-263,1e-262,1e-261,1e-260, 1e-259,1e-258,1e-257,1e-256,1e-255,1e-254,1e-253,1e-252,1e-251,1e-250,1e-249,1e-248,1e-247,1e-246,1e-245,1e-244,1e-243,1e-242,1e-241,1e-240, 1e-239,1e-238,1e-237,1e-236,1e-235,1e-234,1e-233,1e-232,1e-231,1e-230,1e-229,1e-228,1e-227,1e-226,1e-225,1e-224,1e-223,1e-222,1e-221,1e-220, 1e-219,1e-218,1e-217,1e-216,1e-215,1e-214,1e-213,1e-212,1e-211,1e-210,1e-209,1e-208,1e-207,1e-206,1e-205,1e-204,1e-203,1e-202,1e-201,1e-200, 1e-199,1e-198,1e-197,1e-196,1e-195,1e-194,1e-193,1e-192,1e-191,1e-190,1e-189,1e-188,1e-187,1e-186,1e-185,1e-184,1e-183,1e-182,1e-181,1e-180, 1e-179,1e-178,1e-177,1e-176,1e-175,1e-174,1e-173,1e-172,1e-171,1e-170,1e-169,1e-168,1e-167,1e-166,1e-165,1e-164,1e-163,1e-162,1e-161,1e-160, 1e-159,1e-158,1e-157,1e-156,1e-155,1e-154,1e-153,1e-152,1e-151,1e-150,1e-149,1e-148,1e-147,1e-146,1e-145,1e-144,1e-143,1e-142,1e-141,1e-140, 1e-139,1e-138,1e-137,1e-136,1e-135,1e-134,1e-133,1e-132,1e-131,1e-130,1e-129,1e-128,1e-127,1e-126,1e-125,1e-124,1e-123,1e-122,1e-121,1e-120, 1e-119,1e-118,1e-117,1e-116,1e-115,1e-114,1e-113,1e-112,1e-111,1e-110,1e-109,1e-108,1e-107,1e-106,1e-105,1e-104,1e-103,1e-102,1e-101,1e-100, 1e-99, 1e-98, 1e-97, 1e-96, 1e-95, 1e-94, 1e-93, 1e-92, 1e-91, 1e-90, 1e-89, 1e-88, 1e-87, 1e-86, 1e-85, 1e-84, 1e-83, 1e-82, 1e-81, 1e-80, 1e-79, 1e-78, 1e-77, 1e-76, 1e-75, 1e-74, 1e-73, 1e-72, 1e-71, 1e-70, 1e-69, 1e-68, 1e-67, 1e-66, 1e-65, 1e-64, 1e-63, 1e-62, 1e-61, 1e-60, 1e-59, 1e-58, 1e-57, 1e-56, 1e-55, 1e-54, 1e-53, 1e-52, 1e-51, 1e-50, 1e-49, 1e-48, 1e-47, 1e-46, 1e-45, 1e-44, 1e-43, 1e-42, 1e-41, 1e-40, 1e-39, 1e-38, 1e-37, 1e-36, 1e-35, 1e-34, 1e-33, 1e-32, 1e-31, 1e-30, 1e-29, 1e-28, 1e-27, 1e-26, 1e-25, 1e-24, 1e-23, 1e-22, 1e-21, 1e-20, 1e-19, 1e-18, 1e-17, 1e-16, 1e-15, 1e-14, 1e-13, 1e-12, 1e-11, 1e-10, 1e-9, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1e+0, 1e+1, 1e+2, 1e+3, 1e+4, 1e+5, 1e+6, 1e+7, 1e+8, 1e+9, 1e+10, 1e+11, 1e+12, 1e+13, 1e+14, 1e+15, 1e+16, 1e+17, 1e+18, 1e+19, 1e+20, 1e+21, 1e+22, 1e+23, 1e+24, 1e+25, 1e+26, 1e+27, 1e+28, 1e+29, 1e+30, 1e+31, 1e+32, 1e+33, 1e+34, 1e+35, 1e+36, 1e+37, 1e+38, 1e+39, 1e+40, 1e+41, 1e+42, 1e+43, 1e+44, 1e+45, 1e+46, 1e+47, 1e+48, 1e+49, 1e+50, 1e+51, 1e+52, 1e+53, 1e+54, 1e+55, 1e+56, 1e+57, 1e+58, 1e+59, 1e+60, 1e+61, 1e+62, 1e+63, 1e+64, 1e+65, 1e+66, 1e+67, 1e+68, 1e+69, 1e+70, 1e+71, 1e+72, 1e+73, 1e+74, 1e+75, 1e+76, 1e+77, 1e+78, 1e+79, 1e+80, 1e+81, 1e+82, 1e+83, 1e+84, 1e+85, 1e+86, 1e+87, 1e+88, 1e+89, 1e+90, 1e+91, 1e+92, 1e+93, 1e+94, 1e+95, 1e+96, 1e+97, 1e+98, 1e+99, 1e+100, 1e+101,1e+102,1e+103,1e+104,1e+105,1e+106,1e+107,1e+108,1e+109,1e+110,1e+111,1e+112,1e+113,1e+114,1e+115,1e+116,1e+117,1e+118,1e+119,1e+120, 1e+121,1e+122,1e+123,1e+124,1e+125,1e+126,1e+127,1e+128,1e+129,1e+130,1e+131,1e+132,1e+133,1e+134,1e+135,1e+136,1e+137,1e+138,1e+139,1e+140, 1e+141,1e+142,1e+143,1e+144,1e+145,1e+146,1e+147,1e+148,1e+149,1e+150,1e+151,1e+152,1e+153,1e+154,1e+155,1e+156,1e+157,1e+158,1e+159,1e+160, 1e+161,1e+162,1e+163,1e+164,1e+165,1e+166,1e+167,1e+168,1e+169,1e+170,1e+171,1e+172,1e+173,1e+174,1e+175,1e+176,1e+177,1e+178,1e+179,1e+180, 1e+181,1e+182,1e+183,1e+184,1e+185,1e+186,1e+187,1e+188,1e+189,1e+190,1e+191,1e+192,1e+193,1e+194,1e+195,1e+196,1e+197,1e+198,1e+199,1e+200, 1e+201,1e+202,1e+203,1e+204,1e+205,1e+206,1e+207,1e+208,1e+209,1e+210,1e+211,1e+212,1e+213,1e+214,1e+215,1e+216,1e+217,1e+218,1e+219,1e+220, 1e+221,1e+222,1e+223,1e+224,1e+225,1e+226,1e+227,1e+228,1e+229,1e+230,1e+231,1e+232,1e+233,1e+234,1e+235,1e+236,1e+237,1e+238,1e+239,1e+240, 1e+241,1e+242,1e+243,1e+244,1e+245,1e+246,1e+247,1e+248,1e+249,1e+250,1e+251,1e+252,1e+253,1e+254,1e+255,1e+256,1e+257,1e+258,1e+259,1e+260, 1e+261,1e+262,1e+263,1e+264,1e+265,1e+266,1e+267,1e+268,1e+269,1e+270,1e+271,1e+272,1e+273,1e+274,1e+275,1e+276,1e+277,1e+278,1e+279,1e+280, 1e+281,1e+282,1e+283,1e+284,1e+285,1e+286,1e+287,1e+288,1e+289,1e+290,1e+291,1e+292,1e+293,1e+294,1e+295,1e+296,1e+297,1e+298,1e+299,1e+300, 1e+301,1e+302,1e+303,1e+304,1e+305,1e+306,1e+307,1e+308 }; RAPIDJSON_ASSERT(n <= 308); return n < -308 ? 0.0 : e[n + 308]; } simply change this to POW(10,n) and it works
Was the N passed to that function wrong? I wonder why it worked in one case but not another.
No the real code uses a lot of static global stuff so I think at some point it runs out. I don't like the look-up table there, bad practice, N is passsed right, look at the fix it does not alter N,
I'm glad that you were able to identify the issue. I'm still not sure about real reason on what's happening, but i can agree that in our case possible performance benefit of lookup table is negligible and we can safely use pow. Since rapidjson is 3rd party code - let me clarify few things about making changes and perform few tests.
Thanks for the followup, I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this?
I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads
or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I dont know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Also, does AMGX support lu(k) as well (I see "ilu_sparsity_level"), I wonder if it has ilu by threshold as well?
Spasiba for your help.
by the way the following might be better than good ole pow
inline double Pow10(int n) { std::string tmp; tmp="1e"+std::to_string(n); double ret=std::stod(tmp,nullptr); RAPIDJSON_ASSERT(n <= 308); RAPIDJSON_ASSERT(n > -308); return ret; }
one more question, is it possible to disable the print out of Using Normal MPI (Hostbuffer) communicator... it is unnecessary for realistic big case runs just clutters the log file, thanks again for your support
Yep, will move it to higher verbosity level.
Thanks,
Hi Marat., I had one more question on the previous comment, most importantly I have one more issue to resolve and that is for certain cases some of my ranks have 0 number of elements which leads to failure at matrix construction, what is the best way to go about this? I can generate a communicator and only include ranks that have non-zero elements, which adds some collective call overheads or I can simulate that pretending that rank with zero element has only one neighbor that is self, since I don't know enough about AMGX's under the hood I would like to know your opinion on this, and this is something frequently happens in our simulations due to lots of refine/de-refinement. Can AMGX handle solving several disconnect graphs? In my example in deadlocks. I could get it to work with defining new MPI communicator though!
We don't handle such cases specifically, so result would be unpredictable. Are there a lot of such ranks? How often set of ranks that has zero elements change? I would guess that cumulative performance penalty and where those penalties occur on any of those would depend on the specifics of your problem and the way you call AMGX, but both should work! If you need to have working solution right now probably would be a good idea to try both suggestions from outside AMGX. But you are right - the more balanced amount of data there is per GPU the more throughput is achievable - i.e. one of the reasons that solving wells/reservoir together in a single matrix would be a bad idea. My wild guess would be i think there is no critical infrastructure code changes to be done to support that case, but still some scoping is needed to see what's going on. If this case support would ever be implemented then it would be transparent for the user - just by providing 0 for rank's number of elements.
I think the most robust way to handle this is via communicators, it happens a lot in internal combustion engine simulations as the number of mesh in different phases changes drastically. Thanks for your feedback and support. We will soon try some realistic cases on a leadership class cluster
I am trying to integrate AMGX to an industrial code. if I build in in debug mode, there are no parsing issues, but when in release mode, it gives the following error; the input file is AGGREGATION_JACOBI.json Converting config string to current config version Error parsing parameter string: Incorrect config entry (number of equal signs is not 1) : "config_version": 2 To make stuff more interesting, if I use AMGX_config_create(&m_config, "config_version=2,algorithm=AGGREGATION,selector=SIZE_2,print_grid_stats=1,max_iters=1000,monitor_residual=1,obtain_timings=1,print_solve_stats=1,print_grid_stats=1"); it refuses to print grid and solver stats but accepts the rest of the inputs.