Does it support trilinos epetra matrix .

ztdepztdep commented 2 years ago

Does it support trilinos epetra matrix or can i read in a matrix in matrix market format?

Hitenze commented 2 years ago

Thank you for the question. No the current version doesn't support trillions epetra matrix. However, you can read in matrices in matrix market format. Thanks.

ztdepztdep commented 2 years ago

great but what if epetra can provide the CRS data to pargemslr

	@.***

@.*** |

---- Replied Message ---- | From | Tianshi @.> | | Date | 09/05/2022 01:38 | | To | @.> | | Cc | @.**@.> | | Subject | Re: [Hitenze/pargemslr] Does it support trilinos epetra matrix . (Issue #5) |

Thank you for the question. No the current version doesn't support trillions epetra matrix. However, you can read in matrices in matrix market format. Thanks.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Hitenze commented 2 years ago

We currently don't have those routines in the main branch, but I've just created a new branch called "matio" with some routines we currently working on (not fully tested). They will be merged into the main branch in the future.

If you have the entire matrix in CSR format on the root processor (MPI rank 0) then you can modify the following test driver:

https://github.com/Hitenze/pargemslr/blob/b72ad43cb363fe81a03109e254dc8edf7ddf150f/ParGeMSLR/TESTS/parallel/driver_laplacian_gemslr_csr.cpp#L219-L227

If your matrix is stored in distributed CSR format, please let me know and I can upload another test driver for you.

Hope those helps. Thank you!

ztdepztdep commented 2 years ago

Thanks a lot. I use the distributed CSR format in the epetra.

Hitenze commented 2 years ago

Thanks. I've uploaded another function to transfer distributed CSR to parCSR format. See the most recent commit to "matio" branch.

You can directly use this function if you have the local CSR on each MPI processor, and a global vector pointing to the row number of first row on each MPI processor. For example if:

      |1  2|  => stored on rank 0
A =   |3  4|  => stored on rank 1

then A0 = |1 2| is on rank 0, and A1 = |3 4| is on rank 1 in CSR format. The global vector is [0,1,2].

Here is an example: https://github.com/Hitenze/pargemslr/blob/951aa2de9558bdadaf38ed190b6bc2e4a54f610f/ParGeMSLR/TESTS/parallel/driver_laplacian_gemslrz_distcsr.cpp#L247-L251 You can replace row_starts, idxin, dist_i, dist_j, dist_data using your own pointers.

Please let me know if you found any issues. Thank you!

ztdepztdep commented 2 years ago

I want to use the pargemslr to my matrix obtaind from the spectral element method. Could you please give me some sugestions about this? I now use the ifpack+trilinos iterative solver, it doen't employ the structure characteristic of the matrix. I think we will get the great speed up .

AP mtx

At 2022-09-06 08:34:30, "Tianshi Xu" @.***> wrote:

Thanks. I've uploaded another function to transfer distributed CSR to parCSR format. See the most recent commit to "matio" branch.

You can directly use this function if you have the local CSR on each MPI processor, and a global vector pointing to the row number of first row on each MPI processor. For example if:

  |1  2|  => stored on rank 0

A = |3 4| => stored on rank 1

then A0 = |1 2| is on rank 0, and A1 = |3 4| is on rank 1 in CSR format. The global vector is [0,1,2].

Here is an example: https://github.com/Hitenze/pargemslr/blob/951aa2de9558bdadaf38ed190b6bc2e4a54f610f/ParGeMSLR/TESTS/parallel/driver_laplacian_gemslrz_distcsr.cpp#L247-L251 You can replace row_starts, idxin, dist_i, dist_j, dist_data using your own pointers.

Please let me know if you found any issues. Thank you!

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ztdepztdep commented 2 years ago

error occured when i run test 1. I noticied in the gr_30_30_b.mtx file, only have several rows? is it right?

reading setup from file "inputs" Start running tests. Total 1 tests. Running test number 1 Solving general matrix ./gr_30_30.mtx Solving with right-hand-side from file ./gr_30_30_b.mtx Using zero vector as initial guess free(): invalid size free(): invalid size Aborted (core dumped)

I tried to input my matrix. it feeds back the error: Running test number 1 Solving general matrix ./input/Adiffusion.mtx Sorry, this application does not support Market Market type: [matrix array real general]

Could you please help me out.

Hitenze commented 2 years ago

We are very happy to help you with your application. Is it possible to provide us with some small test matrices generated using your code in CSR format? We can try them on our side first. Thank you.
For the right-hand side file, the gr_30_30_b.mtx only has very few nonzeros. This file is only a toy problem for demonstrative purposes. You can replace it with your own right-hand side.
Regarding the memory error, thank you for reporting the bug. There might be some issues with the makefile of this temporary branch. We would recommend you clean and rebuild the library first:
- Go to folder pargemslr/ParGeMSLR
- make clean
- make -j
- Go to folder pargemslr/ParGeMSLR/TESTS/parallel
- make clean
- make -j
If the error persists, please provide us with the operating system and hardware, and we will try our best to reproduce the issue and solve the problem as soon as possible. Thank you!

ztdepztdep commented 2 years ago

We are very happy to help you with your application. Is it possible to provide us with some small test matrices generated using your code in CSR format? We can try them on our side first. Thank you.

For the right-hand side file, the gr_30_30_b.mtx only has very few nonzeros. This file is only a toy problem for demonstrative purposes. You can replace it with your own right-hand side.

Regarding the memory error, thank you for reporting the bug. There might be some issues with the makefile of this temporary branch. We would recommend you clean and rebuild the library first:

Go to folder pargemslr/ParGeMSLR

make clean

make -j

Go to folder pargemslr/ParGeMSLR/TESTS/parallel

make clean

make -j

If the error persists, please provide us with the operating system and hardware, and we will try our best to reproduce the issue and solve the problem as soon as possible. Thank you!

I think the gr_30_30 is provided by you. why it cann't work with the first serial example code?

Hitenze commented 2 years ago

Yes it is provided by us. It works fine on all the systems we have. Have you tried clean and recompile the library? If it still not working please let me know the environment you are using. Thank you.

ztdepztdep commented 2 years ago

need to set other parameters?

Start running tests. Total 1 tests. Running test number 1 Solving general matrix ./gr_30_30.mtx Solving with right-hand-side from file ./gr_30_30_b.mtx Using zero vector as initial guess free(): invalid size free(): invalid size Aborted (core dumped)

Hitenze commented 2 years ago

No need to set any parameters. The default setting should work for all the test drivers.

Are you using ParMETIS with int64 and double precision? What is your operating system. Thanks.

Hitenze commented 2 years ago

Here is an example of the output on my side for your reference.

mpirun -np 1 ./driver_gen_gemslr_seq.ex

output

``` -------------------------------- Printing parallel information MPI Info: Number of MPI Ranks: 1 -------------------------------- Reading setup from file "inputs" Start running tests. Total 1 tests. Running test number 1 Solving general matrix ./Matrices/gr_30_30.mtx Solving with right-hand-side from file ./Matrices/gr_30_30_b.mtx Using zero vector as initial guess -------------------------------- Setup GeMSLR -------------------------------- |Level| Ncomp| Size | Nnz | rk | nnzLU | nnzLR | | 0| 16| 900| 6851| 32| 7.796000e+03| 7.776000e+03| | 1| 8| 211| 355| 32| 2.770000e+02| 5.376000e+03| | 2| 4| 136| 252| 20| 2.360000e+02| 1.840000e+03| | 3| 2| 72| 156| 15| 1.500000e+02| 7.650000e+02| | 4| 1| 36| 130| 0| 1.320000e+02| 0.000000e+00| -------------------------------- Start FlexGMRES(50) Residual Tol: 6.324555e-08 Max number of inner iterations: 200 -------------------------------- Step Residual norm Relative res. Convergence Rate 0 6.324555e+00 1.000000e+00 N/A 1 5.173730e-03 8.180385e-04 0.000818 2 3.451545e-05 5.457372e-06 0.006671 3 1.660622e-07 2.625674e-08 0.004811 4 4.533150e-10 7.167539e-11 0.002730 -------------------------------- Solution info: Number of iterations: 4 Final rel res: 0.000000 Preconditioner fill level: ILU: 1.109375; Low-rank: 2.034737; Total: 3.144112 -------------------------------- -------------------------------- Time info: Load matrix time: 0.000941s Partition time: 0.005481s Setup time: 0.011576s Solve time: 0.000851s Total time: 0.017908s -------------------------------- -------------------------------- Time detail: Matvec with A time: 0.000329s Precond setup time: 0.017057s -GeMSLR reordering time: 0.004523s -GeMSLR Setup Structure time: 0.000958s -GeMSLR ILU setup time: 0.000315s --GeMSLR ILU reordering time: 0.000000s - (note: this is the time on p0.) -GeMSLR low-rank setup time: 0.011213s --GeMSLR arnoldi iter time: 0.006250s ---GeMSLR MGS time: 0.002491s ---GeMSLR EB^{-1}FC^{-1} time: 0.003735s ---GeMSLR setup ILU solve time: 0.002280s ---GeMSLR setup ILU solve last lev: 0.000119s ---GeMSLR setup LRC apply time: 0.000644s ---GeMSLR setup sparse matvec time: 0.000234s --GeMSLR build result time: 0.000192s --GeMSLR Lapack Dcomp time: 0.004031s Precond applying time: 0.000531s -GeMSLR ILU solve time: 0.000406s -GeMSLR ILU solve last lev: 0.000006s -GeMSLR sparse matvec time: 0.000103s -GeMSLR LRC apply time: 0.000179s Iterative solve MGS time: 0.000027s -------------------------------- All tests done ```

ztdepztdep commented 2 years ago

Here is an example of the output on my side for your reference.

mpirun -np 1 ./driver_gen_gemslr_seq.ex

output

ok , I reset the parmetis with 64bit. and I can run the seq code successfully.But the parallel code can't be made. parmetis path has already been set. I use 4.0.3 version. the error is:

/home/ztdep/Downloads/pargemslr/ParGeMSLR/SRC/matrices/matrixops.cpp:8077: undefined reference to ParMETIS_V3_PartKway' /usr/lib64/gcc/x86_64-suse-linux/7/../../../../x86_64-suse-linux/bin/ld: /home/ztdep/Downloads/pargemslr/ParGeMSLR/SRC/matrices/matrixops.cpp:8082: undefined reference toParMETIS_V3_RefineKway'

Hitenze commented 2 years ago

Our interface if for parmetis 4.0.3 so the version should be fine.

By default the path in makefile.in is relative path, that might be the issue.

If you are setting the absolute path for parmetis, change the line 29 of the TESTS/parallel/makefile should resolve the issue.

From

-L../../$(METIS_PATH)/libparmetis -lparmetis -L../../$(METIS_PATH)/libmetis -lmetis

to

-L$(METIS_PATH)/libparmetis -lparmetis -L$(METIS_PATH)/libmetis -lmetis

Thanks.

ztdepztdep commented 2 years ago

I run the parallel code with my matrix. It seems with more than 1 cpus, it gets slower? In my application, I need to solve different right hands with A unchanged, so i concerns the "solve time " more. what should i do to resolve this problem.

when i run with 1 cpu. Time info: Load matrix time: 0.388120s Partition time: 0.000000s Setup time: 1.233658s Solve time: 0.166337s Total time: 1.399995s

but when i run with 2 cpus: Time info: Load matrix time: 0.427087s Partition time: 0.000000s Setup time: 0.540444s Solve time: 3.033501s Total time: 3.573946s with 10 cpus: Time info: Load matrix time: 0.465784s Partition time: 0.000000s Setup time: 0.113098s Solve time: 1.312183s Total time: 1.425281s

Hitenze commented 2 years ago

What is the size of your matrix? I can provide you the parameters based on it. Thank you.

ztdepztdep commented 2 years ago

AP mtx

Hitenze commented 2 years ago

How many rows?

ztdepztdep commented 2 years ago

22801 rows with total 729061 nonzero

Hitenze commented 2 years ago

OK. Thank you.

Since the problem is not very large, reducing the number of levels might help. Is it possible to try with the following settings? Thank you!

GEMSLR                     1. global_precond          = PRECOND: global preconditioenr option. (BJ, ESCHUR, GEMSLR).
1                          2. use_global_partition    = PREPROCESSING: Use global parition on the top level? (GEMSLR only, 1: yes, 0: no).
ILUT                       3. B_precond_top           = PRECOND: local preconditioner option for the top several levels (ILUT, ILUK, GEMSLR).
1                          4. B_precond_top_levels    = PRECOND: number of levels we apply the first preconditioner.
ILUT                       5. B_precond_lower         = PRECOND: local preconditioner option for other levels (ILUT, ILUK, GEMSLR).
BJILUT                     6. C_precond               = PRECOND: preconditioner option for the last C. (ILUT, ILUK, BJILUT, BJILUK, BJILUT2, BJILUK2).
1                          7. C_iter                  = PRECOND: number of iterations with C^{-1} on the last level for BJ option (only for BJILUT2 or BJILUK2).
50                         8. kdim                    = SOLVER: dimension of Krylov subspace in (outer) FGMRES.
500                        9. maxits                  = SOLVER: maxits in outer FGMRES.
1e-06                     10. tol                     = SOLVER: tolorance for FGMRES, stop when ||r|| < tol or ||r||/||b|| < tol.
0                         11. absolute_tol            = SOLVER: Use absolute tol ||r|| or relative tol ||r||/||b||.
3     2                   12. nlev                    = PREPROCESSING: the number of levels of the global/local preconditioner.
32    2                   13. ncomp                   = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
RKWAY                     14. global_partition_option = PREPROCESSING: the global partition option (ND, RKWAY). ParMETIS ND is used.
RKWAY                     15. local_partition_option  = PREPROCESSING: the B preconditioner partition option (ND, RKWAY).
1e-05 1e-05 1e-05 1e-05   16. ilu_droptol             = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
2000   2000   2000        17. ilu_max_row_nnz         = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
2     2     2             18. ilu_lfil                = PRECOND: level of fill of ILUK for the B,  block.
RCM                       19. ilu_perm_option         = PRECOND: permutation option. (NO, RCM).
TR                        20. lr_arnoldi_option1      = LOWRANK: the Low-rank Option in GEMSLR for the top level. (STD, TR, SUB).
TR                        21. lr_arnoldi_option2      = LOWRANK: the Low-rank Option in GEMSLR for other levels. (STD, TR, SUB).
TR                        22. lr_arnoldi_optionA      = LOWRANK: the Low-rank Option in GEMSLR for A. (STD, TR, SUB).
32    0    0              23. lr_rank                 = LOWRANK: the target size of the low-rank correction on the top/other levels/A.
1e-03 1e-03 1e-03         24. lr_tol_eig              = LOWRANK: eigenvalues with residual norm smaller than lr_tol_eig are considered convergenceded.
1.0   1.0   1.0           25. lr_rank_factor          = LOWRANK: the target convergenced eigen-pairs is neig = lr_rankx * lr_rank_factor.
2.0   2.0   2.0           26. lr_arnoldi_factor       = LOWRANK: the Arnoldi Steps during each restart is msteps = lr_rankx * (1.0 + lr_rank_factor) * lr_arnoldi_factor.
10    10    10            27. lr_maxits               = LOWRANK: the max number of restarts for the thick-restart Arnoldi or subspace iteration.
0                         28. inner_iteration         = SCHUR: apply FGMRES on the top level Schur Complement system in the solve phase (1: yes, 0: no).
1e-12                     29. inner_iter_tol          = SCHUR: the tolorance of the SCHUR-FGMRES.
5                         30. inner_iter_maxits       = SCHUR: the max number of FGMRES outer iteration of the SCHUR-FGMRES.
0                         31. diag_shift              = ADVANCED: Complex Version Only. Enable complex shift in the complex ILUT? (1: yes, 0: no).
LU                        32. global_solve            = ADVANCED: the global solve option for GeMSLR (LU, U, MUL).
1     1     2             33. npx, npy, npz           = PARALLEL: nproc decomposition on each direction (for laplacian only, default is 1 np 1).
4     4     2             34. ndx, ndy, ndz           = PARALLEL: domain decomposition on each direction (for laplacian only).
1                         35. print_level             = GENERAL: the print option. 0: basic info only; 1: more info; 2: even more info with gnuplot graphics.

If the number of iteration is too large, increase the rank on line 23 from 32 to 64 might help.

LiMuxing666 commented 11 months ago

this is the size of my matrix 10320484 rows with total 80147883 nonzero ,could you provide me the parameters based on it. Thank you very mush.

Hitenze commented 11 months ago

Hi, thank you for your question! It appears that the fill-in of the problem is not very high. If your problem is not highly indefinite, you might want to consider the following option:

3     2                   12. nlev                    = PREPROCESSING: the number of levels of the global/local preconditioner.
1024   2                  13. ncomp                   = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
....
1e-03 1e-03 1e-03 1e-03   16. ilu_droptol             = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
500  500  500             17. ilu_max_row_nnz         = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
...
50   0    0               23. lr_rank                 = LOWRANK: the target size of the low-rank correction on the top/other levels/A.

If convergence is not achieved, for instance, in cases where the problem exhibits high indefiniteness, one option is to increase the number of subdomains (to 2048 or even 4096) and reduce the drop tolerance for ILU.

3     2                   12. nlev                    = PREPROCESSING: the number of levels of the global/local preconditioner.
2048   2                  13. ncomp                   = PREPROCESSING: the number of subdomains for the global/local GeMSLR.
....
1e-05 1e-05 1e-05 1e-05   16. ilu_droptol             = PRECOND: tolerance of ILUT for the B, EF, S, and C blocks.
2000  2000  2000          17. ilu_max_row_nnz         = PRECOND: maximum number of nonzeros per row of ILUT for the B, S, and C blocks.
...
100   0    0              23. lr_rank                 = LOWRANK: the target size of the low-rank correction on the top/other levels/A.

Thank you!

LiMuxing666 commented 11 months ago

Thank you very much for your help. I made some modifications based on your suggestions. Currently, I am using 32 cores for computation, and it takes 13 minutes and 39 seconds. The residual is less than 1e-3. I hope to reduce the residual to below 1e-5. After many attempts, I haven’t achieved the desired effect. I kindly request your advice again. Here are the parameters I am currently using:inputs2.txt

Hitenze commented 11 months ago

Thank you for reaching out! I hope the following option proves useful:

inputs2.txt

Additionally, if your budget permits, you might consider:

increase 8. kdim (maybe 100, 150, or 200, takes more memory)
increase 30. inner_iter_maxits (to 10 or 15, each iteration will be more expensive)
reduce 16. ilu_droptol

Furthermore, if your problem is highly indefinite, you might find it beneficial to activate the 31. diag_shift option.

If these adjustments do not resolve the issue, would you be able to share the matrix with me? I can conduct some experiments on my end and offer more specific recommendations.

Thank you!

LiMuxing666 commented 11 months ago

First of all, thank you very much for your suggestions. Your suggestions have been of great help to me. I tried the parameters you provided, but unfortunately, the solving process was extremely slow. In desperation, I made some modifications based on my own parameters, and the results improved significantly compared to before. I used 32 cores for computation, and it took 5 minutes and 10 seconds to achieve a residual of less than 1e-3. However, I’m sorry to say that even after a long time, I couldn’t reach a residual of less than 1e-4, which I was hoping for. Here are my parameters: inputs2.1.txt As for sharing my matrix with you, I can certainly do that. Could you please add me on QQ: 1464947177? I will transfer my matrix to you.

Hitenze commented 11 months ago

Thank you for your feedback. I will contact you later for the matrix.

Hitenze / pargemslr

Does it support trilinos epetra matrix . #5