Closed ztdepztdep closed 11 months ago
Thank you for the question. No the current version doesn't support trillions epetra matrix. However, you can read in matrices in matrix market format. Thanks.
great but what if epetra can provide the CRS data to pargemslr
@.*** | |
---|---|
@.*** |
---- Replied Message ---- | From | Tianshi @.> | | Date | 09/05/2022 01:38 | | To | @.> | | Cc | @.**@.> | | Subject | Re: [Hitenze/pargemslr] Does it support trilinos epetra matrix . (Issue #5) |
Thank you for the question. No the current version doesn't support trillions epetra matrix. However, you can read in matrices in matrix market format. Thanks.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
We currently don't have those routines in the main branch, but I've just created a new branch called "matio" with some routines we currently working on (not fully tested). They will be merged into the main branch in the future.
If you have the entire matrix in CSR format on the root processor (MPI rank 0) then you can modify the following test driver:
If your matrix is stored in distributed CSR format, please let me know and I can upload another test driver for you.
Hope those helps. Thank you!
Thanks a lot. I use the distributed CSR format in the epetra.
Thanks. I've uploaded another function to transfer distributed CSR to parCSR format. See the most recent commit to "matio" branch.
You can directly use this function if you have the local CSR on each MPI processor, and a global vector pointing to the row number of first row on each MPI processor. For example if:
|1 2| => stored on rank 0
A = |3 4| => stored on rank 1
then A0 = |1 2| is on rank 0, and A1 = |3 4| is on rank 1 in CSR format. The global vector is [0,1,2].
Here is an example: https://github.com/Hitenze/pargemslr/blob/951aa2de9558bdadaf38ed190b6bc2e4a54f610f/ParGeMSLR/TESTS/parallel/driver_laplacian_gemslrz_distcsr.cpp#L247-L251 You can replace row_starts, idxin, dist_i, dist_j, dist_data using your own pointers.
Please let me know if you found any issues. Thank you!
I want to use the pargemslr to my matrix obtaind from the spectral element method. Could you please give me some sugestions about this? I now use the ifpack+trilinos iterative solver, it doen't employ the structure characteristic of the matrix. I think we will get the great speed up .
At 2022-09-06 08:34:30, "Tianshi Xu" @.***> wrote:
Thanks. I've uploaded another function to transfer distributed CSR to parCSR format. See the most recent commit to "matio" branch.
You can directly use this function if you have the local CSR on each MPI processor, and a global vector pointing to the row number of first row on each MPI processor. For example if:
|1 2| => stored on rank 0
A = |3 4| => stored on rank 1
then A0 = |1 2| is on rank 0, and A1 = |3 4| is on rank 1 in CSR format. The global vector is [0,1,2].
Here is an example: https://github.com/Hitenze/pargemslr/blob/951aa2de9558bdadaf38ed190b6bc2e4a54f610f/ParGeMSLR/TESTS/parallel/driver_laplacian_gemslrz_distcsr.cpp#L247-L251 You can replace row_starts, idxin, dist_i, dist_j, dist_data using your own pointers.
Please let me know if you found any issues. Thank you!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
error occured when i run test 1. I noticied in the gr_30_30_b.mtx file, only have several rows? is it right?
reading setup from file "inputs" Start running tests. Total 1 tests. Running test number 1 Solving general matrix ./gr_30_30.mtx Solving with right-hand-side from file ./gr_30_30_b.mtx Using zero vector as initial guess free(): invalid size free(): invalid size Aborted (core dumped)
I tried to input my matrix. it feeds back the error: Running test number 1 Solving general matrix ./input/Adiffusion.mtx Sorry, this application does not support Market Market type: [matrix array real general]
Could you please help me out.
We are very happy to help you with your application. Is it possible to provide us with some small test matrices generated using your code in CSR format? We can try them on our side first. Thank you.
For the right-hand side file, the gr_30_30_b.mtx only has very few nonzeros. This file is only a toy problem for demonstrative purposes. You can replace it with your own right-hand side.
Regarding the memory error, thank you for reporting the bug. There might be some issues with the makefile of this temporary branch. We would recommend you clean and rebuild the library first:
If the error persists, please provide us with the operating system and hardware, and we will try our best to reproduce the issue and solve the problem as soon as possible. Thank you!
- We are very happy to help you with your application. Is it possible to provide us with some small test matrices generated using your code in CSR format? We can try them on our side first. Thank you.
- For the right-hand side file, the gr_30_30_b.mtx only has very few nonzeros. This file is only a toy problem for demonstrative purposes. You can replace it with your own right-hand side.
Regarding the memory error, thank you for reporting the bug. There might be some issues with the makefile of this temporary branch. We would recommend you clean and rebuild the library first:
- Go to folder pargemslr/ParGeMSLR
- make clean
- make -j
- Go to folder pargemslr/ParGeMSLR/TESTS/parallel
- make clean
- make -j
- If the error persists, please provide us with the operating system and hardware, and we will try our best to reproduce the issue and solve the problem as soon as possible. Thank you!
I think the gr_30_30 is provided by you. why it cann't work with the first serial example code?
Yes it is provided by us. It works fine on all the systems we have. Have you tried clean and recompile the library? If it still not working please let me know the environment you are using. Thank you.
need to set other parameters?
Start running tests. Total 1 tests. Running test number 1 Solving general matrix ./gr_30_30.mtx Solving with right-hand-side from file ./gr_30_30_b.mtx Using zero vector as initial guess free(): invalid size free(): invalid size Aborted (core dumped)
No need to set any parameters. The default setting should work for all the test drivers.
Are you using ParMETIS with int64 and double precision? What is your operating system. Thanks.
Here is an example of the output on my side for your reference.
mpirun -np 1 ./driver_gen_gemslr_seq.ex
Here is an example of the output on my side for your reference.
mpirun -np 1 ./driver_gen_gemslr_seq.ex
output
ok , I reset the parmetis with 64bit. and I can run the seq code successfully.But the parallel code can't be made. parmetis path has already been set. I use 4.0.3 version. the error is:
/home/ztdep/Downloads/pargemslr/ParGeMSLR/SRC/matrices/matrixops.cpp:8077: undefined reference to ParMETIS_V3_PartKway' /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /home/ztdep/Downloads/pargemslr/ParGeMSLR/SRC/matrices/matrixops.cpp:8082: undefined reference to
ParMETIS_V3_RefineKway'
Our interface if for parmetis 4.0.3 so the version should be fine.
By default the path in makefile.in is relative path, that might be the issue.
If you are setting the absolute path for parmetis, change the line 29 of the TESTS/parallel/makefile should resolve the issue.
From
-L../../$(METIS_PATH)/libparmetis -lparmetis -L../../$(METIS_PATH)/libmetis -lmetis
to
-L$(METIS_PATH)/libparmetis -lparmetis -L$(METIS_PATH)/libmetis -lmetis
Thanks.
I run the parallel code with my matrix. It seems with more than 1 cpus, it gets slower? In my application, I need to solve different right hands with A unchanged, so i concerns the "solve time " more. what should i do to resolve this problem.
when i run with 1 cpu. Time info: Load matrix time: 0.388120s Partition time: 0.000000s Setup time: 1.233658s Solve time: 0.166337s Total time: 1.399995s
but when i run with 2 cpus: Time info: Load matrix time: 0.427087s Partition time: 0.000000s Setup time: 0.540444s Solve time: 3.033501s Total time: 3.573946s with 10 cpus: Time info: Load matrix time: 0.465784s Partition time: 0.000000s Setup time: 0.113098s Solve time: 1.312183s Total time: 1.425281s
What is the size of your matrix? I can provide you the parameters based on it. Thank you.
How many rows?
22801 rows with total 729061 nonzero
OK. Thank you.
Since the problem is not very large, reducing the number of levels might help. Is it possible to try with the following settings? Thank you!
GEMSLR 1. global_precond = PRECOND: global preconditioenr option. (BJ, ESCHUR, GEMSLR).
1 2. use_global_partition = PREPROCESSING: Use global parition on the top level? (GEMSLR only, 1: yes, 0: no).
ILUT 3. B_precond_top = PRECOND: local preconditioner option for the top several levels (ILUT, ILUK, GEMSLR).
1 4. B_precond_top_levels = PRECOND: number of levels we apply the first preconditioner.
ILUT 5. B_precond_lower = PRECOND: local preconditioner option for other levels (ILUT, ILUK, GEMSLR).
BJILUT 6. C_precond = PRECOND: preconditioner option for the last C. (ILUT, ILUK, BJILUT, BJILUK, BJILUT2, BJILUK2).
1 7. C_iter = PRECOND: number of iterations with C^{-1} on the last level for BJ option (only for BJILUT2 or BJILUK2).
50 8. kdim = SOLVER: dimension of Krylov subspace in (outer) FGMRES.
500 9. maxits = SOLVER: maxits in outer FGMRES.
1e-06 10. tol = SOLVER: tolorance for FGMRES, stop when ||r|| < tol or ||r||/||b|| < tol.
0 11. absolute_tol = SOLVER: Use absolute tol ||r|| or relative tol ||r||/||b||.
3 2 12. nlev = PREPROCESSING: the number of levels of the global/local preconditioner.
32 2 13. ncomp = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
RKWAY 14. global_partition_option = PREPROCESSING: the global partition option (ND, RKWAY). ParMETIS ND is used.
RKWAY 15. local_partition_option = PREPROCESSING: the B preconditioner partition option (ND, RKWAY).
1e-05 1e-05 1e-05 1e-05 16. ilu_droptol = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
2000 2000 2000 17. ilu_max_row_nnz = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
2 2 2 18. ilu_lfil = PRECOND: level of fill of ILUK for the B, block.
RCM 19. ilu_perm_option = PRECOND: permutation option. (NO, RCM).
TR 20. lr_arnoldi_option1 = LOWRANK: the Low-rank Option in GEMSLR for the top level. (STD, TR, SUB).
TR 21. lr_arnoldi_option2 = LOWRANK: the Low-rank Option in GEMSLR for other levels. (STD, TR, SUB).
TR 22. lr_arnoldi_optionA = LOWRANK: the Low-rank Option in GEMSLR for A. (STD, TR, SUB).
32 0 0 23. lr_rank = LOWRANK: the target size of the low-rank correction on the top/other levels/A.
1e-03 1e-03 1e-03 24. lr_tol_eig = LOWRANK: eigenvalues with residual norm smaller than lr_tol_eig are considered convergenceded.
1.0 1.0 1.0 25. lr_rank_factor = LOWRANK: the target convergenced eigen-pairs is neig = lr_rankx * lr_rank_factor.
2.0 2.0 2.0 26. lr_arnoldi_factor = LOWRANK: the Arnoldi Steps during each restart is msteps = lr_rankx * (1.0 + lr_rank_factor) * lr_arnoldi_factor.
10 10 10 27. lr_maxits = LOWRANK: the max number of restarts for the thick-restart Arnoldi or subspace iteration.
0 28. inner_iteration = SCHUR: apply FGMRES on the top level Schur Complement system in the solve phase (1: yes, 0: no).
1e-12 29. inner_iter_tol = SCHUR: the tolorance of the SCHUR-FGMRES.
5 30. inner_iter_maxits = SCHUR: the max number of FGMRES outer iteration of the SCHUR-FGMRES.
0 31. diag_shift = ADVANCED: Complex Version Only. Enable complex shift in the complex ILUT? (1: yes, 0: no).
LU 32. global_solve = ADVANCED: the global solve option for GeMSLR (LU, U, MUL).
1 1 2 33. npx, npy, npz = PARALLEL: nproc decomposition on each direction (for laplacian only, default is 1 np 1).
4 4 2 34. ndx, ndy, ndz = PARALLEL: domain decomposition on each direction (for laplacian only).
1 35. print_level = GENERAL: the print option. 0: basic info only; 1: more info; 2: even more info with gnuplot graphics.
If the number of iteration is too large, increase the rank on line 23 from 32 to 64 might help.
this is the size of my matrix 10320484 rows with total 80147883 nonzero ,could you provide me the parameters based on it. Thank you very mush.
Hi, thank you for your question! It appears that the fill-in of the problem is not very high. If your problem is not highly indefinite, you might want to consider the following option:
3 2 12. nlev = PREPROCESSING: the number of levels of the global/local preconditioner.
1024 2 13. ncomp = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
....
1e-03 1e-03 1e-03 1e-03 16. ilu_droptol = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
500 500 500 17. ilu_max_row_nnz = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
...
50 0 0 23. lr_rank = LOWRANK: the target size of the low-rank correction on the top/other levels/A.
If convergence is not achieved, for instance, in cases where the problem exhibits high indefiniteness, one option is to increase the number of subdomains (to 2048 or even 4096) and reduce the drop tolerance for ILU.
3 2 12. nlev = PREPROCESSING: the number of levels of the global/local preconditioner.
2048 2 13. ncomp = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
....
1e-05 1e-05 1e-05 1e-05 16. ilu_droptol = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
2000 2000 2000 17. ilu_max_row_nnz = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
...
100 0 0 23. lr_rank = LOWRANK: the target size of the low-rank correction on the top/other levels/A.
Thank you!
Thank you very much for your help. I made some modifications based on your suggestions. Currently, I am using 32 cores for computation, and it takes 13 minutes and 39 seconds. The residual is less than 1e-3. I hope to reduce the residual to below 1e-5. After many attempts, I haven’t achieved the desired effect. I kindly request your advice again. Here are the parameters I am currently using:inputs2.txt
Thank you for reaching out! I hope the following option proves useful:
Additionally, if your budget permits, you might consider:
Furthermore, if your problem is highly indefinite, you might find it beneficial to activate the 31. diag_shift option.
If these adjustments do not resolve the issue, would you be able to share the matrix with me? I can conduct some experiments on my end and offer more specific recommendations.
Thank you!
First of all, thank you very much for your suggestions. Your suggestions have been of great help to me. I tried the parameters you provided, but unfortunately, the solving process was extremely slow. In desperation, I made some modifications based on my own parameters, and the results improved significantly compared to before. I used 32 cores for computation, and it took 5 minutes and 10 seconds to achieve a residual of less than 1e-3. However, I’m sorry to say that even after a long time, I couldn’t reach a residual of less than 1e-4, which I was hoping for. Here are my parameters: inputs2.1.txt As for sharing my matrix with you, I can certainly do that. Could you please add me on QQ: 1464947177? I will transfer my matrix to you.
Thank you for your feedback. I will contact you later for the matrix.
Does it support trilinos epetra matrix or can i read in a matrix in matrix market format?