barbagroup / AmgXWrapper

AmgXWrapper: An interface between PETSc and the NVIDIA AmgX library
MIT License
45 stars 23 forks source link

PETSc interface suggestions #16

Open jedbrown opened 7 years ago

jedbrown commented 7 years ago

The present use of MatGetSubMatrix looks awkward and like it wastes significant memory, particularly when there are several MPI processes per device. MatGetSubMatrices does more precisely what you want -- returning a sequential matrix with full rows only on the processes that request it (processes that are not part of gpuWorld can pass empty index sets).

The use of VecGetArray for the RHS will be problematic in case the RHS has been "locked" (e.g., by a higher level solve in PETSc). I recommend VecGetArrayRead.

piyueh commented 7 years ago

Let me see if my understanding is correct: MatGetSubMatrix returns a parallel matrix living on the original communicator, and MatGetSubMatrices returns an array of sequential matrices on each process (and in our case, we only need one sequential matrix, i.e. the length of the array will be one). So MatGetSubMatrix requires more memory because a parallel matrix uses more memory to store additional information even on processes owning no rows?

jedbrown commented 7 years ago

So MPIAIJ matrices are stored with the diagonal block separate from the off-diagonal block (to overlap communication of the halo needed by the vector). MatMPIAIJGetLocalMat takes that partitioned representation and gives you a single matrix with global column space -- that is a copy. But inside the implementation of MatGetSubMatrix_MPIAIJ, the usage here is currently going to take a path that calls a form of MatGetSubMatrices and converts the unpartitioned representation into MPIAIJ. If you call MatGetSubMatrices directly, you eliminate the memory and overhead of converting the submatrices to partitioned format and back to unpartitioned.