Open saurabh636 opened 3 years ago
Have you seen this? https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html
Hi Denis
Thanks, i have seen that example, but i am looking for running shared memory parallell not mpi.
Anything?
Thanks Saurabh
On Sun, Jun 20, 2021, 11:52 PM Denis Demidov @.***> wrote:
Have you seen this? https://amgcl.readthedocs.io/en/latest/tutorial/poisson3DbMPI.html
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ddemidov/amgcl/issues/204#issuecomment-864776296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANYVDWEGDH4KAHK3EJHAU3TT3ORHANCNFSM47A6QHSQ .
I am sorry I do not understand you then. Shared-memory-parallel sounds like openmp to me: https://amgcl.readthedocs.io/en/latest/tutorial/poisson3Db.html
My apologies, the situation is indeed a bit complicated, what I have is a domain decomposition setup for FE, but it does not directly rely on MPI, instead it is a neutral/transparent interface/API to both SMP and MPI. I can setup FE matrices on each proc or thread, but cannot for example use amgl/mpi since I cannot pass on a "world" to the solver. I guess what I need is to re-implement AMGCL/mpi for my native interface. You think this is feasible? again, any pointers?
instead it is a neutral/transparent interface/API to both SMP and MPI
If you have MPI under the abstraction, maybe it is possible to somehow get the MPI communicator from the abstraction? Another option (if you want to test whether amgcl is able to solve your problem before diving into the code) would be to save the systems assembled for each subdomain and feet those to examples/mpi/mpi_solver in matrix market format:
I guess what I need is to re-implement AMGCL/mpi for my native interface. You think this is feasible? again, any pointers?
It should be possible, but not easy. You will have to look into the amgcl/mpi source code for any pointers. You will probably need to abstract away (or reimplement) all of the MPI calls there.
Unfortunately, I cannot access MPI communicator, but I will give re-implementation using native API a shot.
Hi
For AMGCLs parallel solver, examples suggest using the AMGCL's distributed matrix
auto A = std::make_shared<amgcl::mpi::distributed_matrix<Backend>>(comm, local, remote);
where local and remote matrices are "crs
Also, I would like to use a user defined matrix that implements rows_impl<> and works in serial, for distributed_matrix as well, how can I do that?, since it appears distributed_matrix is initialized with a "crs
Thanks for your feedback.
You can just send local strips of the system matrix to each process (where each strip contains sequential chunk of rows, and the columns are globally numbered): https://github.com/ddemidov/amgcl/blob/720f6b7bc325a8a87df9ae33da125a80069b1d5c/amgcl/mpi/make_solver.hpp#L90-L99
The matrix will be split into local and remote parts behind the scenes. This should also accept a matrix adapter (including your own, that implements rows_impl<>
).
Does a row need to be entirely on a single proc? or could it be split across procs as well.
My matrix adapter is based on CRS as well which contains, (Globalrow -> GlobalCols -> Vals) triplet on each proc. A given (GlobalRow, Globalcol) may repeat on multiple procs which implies the 'Vals' need to be added up for that pair of row and column. Will that work?
Does a row need to be entirely on a single proc? or could it be split across procs as well.
Yes, each row has to be owned by a single process.
My matrix adapter is based on CRS as well which contains, (Globalrow -> GlobalCols -> Vals) triplet on each proc. A given (GlobalRow, Globalcol) may repeat on multiple procs which implies the 'Vals' need to be added up for that pair of row and column. Will that work?
No, you will have to finish the assembly (add the same nonzero values across different processes) before sending it to amgcl.
OK, I can ensure the assembly perhaps, however, not sure if it will be efficient to gather the entire row on a single processor, this partition naturally occurs in a FE system, is there a way to allow for this type of (block) partition in amgcl?
From what I reckon, internally in amgcl strips of rows are in fact converted into blocks before solving, isn't that true?
is there a way to allow for this type of (block) partition in amgcl?
No, there is not.
From what I reckon, internally in amgcl strips of rows are in fact converted into blocks before solving, isn't that true?
No. amgcl does separate local and remote columns (diagonal and off-diagonal blocks), but it does not transfer the ownership of columns to other processes. A row is completely owned by a single processor.
If I am able to presplit the matrix into local and remote parts, do I still need to ensure a contiguous chunk of rows belong entirely to a single processor?
Locating an entire row on a single processor appears to be way too much a restriction for a general solver.
Just to let you know, I am able to solve now after taking a huge comm. hit, reshuffling rows back to respective procs and before passing the matrix to AMGCL.
do I still need to ensure a contiguous chunk of rows belong entirely to a single processor?
yes
Locating an entire row on a single processor appears to be way too much a restriction for a general solver. Just to let you know, I am able to solve now after taking a huge comm. hit, reshuffling rows back to respective procs and before passing the matrix to AMGCL.
Somedbody has to do the reordering anyway. You can try to express the matrix permutation operation in terms of matrix-matrix product, similar to how it is done here:
If that works better than your own implementation, then we can see if it is possible to make it generic enough in order to include it in the library API.
Hi Dennis
I am assembling FE equations locally on each processor (or thread), what ingredients do I need to solve the system in parallel? it appears to be solving on each proc independently.
I have the local to global map but cannot figure out how to provide this to the solver, which it needs to solve on all procs in tandem.
Can you provide any pointers? examples?
Best, -saurabh