Closed ztdepztdep closed 2 years ago
The setup phase of amgcl is performed on the CPU, so if you are rebuilding the AMG hierarchy on each time step, then you have to send the matrix to the amgcl constructor in the CPU memory. If you are reusing the same preconditioner during multiple time steps (and using solve(A, f, x)
overload as opposed to solve(f,x)
), then you could in principle construct the matrix on the GPU directly. Note that matrix constructors in both cuda and vexcl backends still take the matrix as CRS arrays in CPU memory, but you can use your own matrix structure as long as you specialize backend::spmv_impl
and backend::residual_impl
templates.
@ztdepztdep
please let me know, when you find an "on-GPU" (limiting PCIE data transfers to the first timestep + domain boundary data GPU<->GPU data exchange), sparse linear algebra library with MPI support for distributed memory, multi GPU simulations. I am also looking into that.
The setup phase of amgcl is performed on the CPU, so if you are rebuilding the AMG hierarchy on each time step, then you have to send the matrix to the amgcl constructor in the CPU memory. If you are reusing the same preconditioner during multiple time steps (and using
solve(A, f, x)
overload as opposed tosolve(f,x)
), then you could in principle construct the matrix on the GPU directly. Note that matrix constructors in both cuda and vexcl backends still take the matrix as CRS arrays in CPU memory, but you can use your own matrix structure as long as you specializebackend::spmv_impl
andbackend::residual_impl
templates.
Hello! I have two additional questions:
Thank you in advance and good luck with future library development.
The rebuild method is still working on the CPU side, so you need to have the matrix available there.
When precond.allow_rebuild
is set to true, the CPU representation of the transfer matrices P
and R
is kept alongside their GPU representation, and is used during the rebuild to propagate the new system matrix down the hierarchy using the Galerkin operator A_c = R A_f P
. The matrix-matrix products in the Galerkin operator are performed on the CPU side, and the results are moved into the GPU memory. So there is no shortcut available that would allow to directly modify the system matrix in the GPU memory.
I need to compute matrix A every time step. it is very time cost to transfer data between GPU and CPU. So how to construct the matrix A on the GPU directly.