Open klausbu opened 4 years ago
Both the setup and solve phases in AmgX are performed on the GPU unless you tell AmgX to run on the host.
It is straightforward to run with GPU resident data structures using the same API calls you would for host resident data structures. As an example, if you were to use AMGX_matrix_upload_distributed
then you can pass a device pointer or a host pointer for the CSR matrix column indexes, row offset and values.
Hey @klausbu, do you have any follow-up questions after Matthew's comment?
@Matthew
Let's assume we have an workstation with 4 GPUs and a CFD case large enough to keep them busy:
What's the multi-GPU concept in the sense of how are distributed memory cases setup/computed/programmed e.g. using MPI?
What are strategies to keep simulations on the GPU rather than moving data from CPU to GPU back and forth e.g. in each timestep - Caching might be a key word?
I understand from a paper, that you work on an OpenFOAM implementation extending PETSC - What's the benefit (if any) of going through PETSC?
The distributed model for this application leverages MPI and that is currently the standard option for such problems. You can domain decompose your problem, pass it to AmgX and the library can handle the communications necessary for the linear solve. You can look at https://github.com/barbagroup/AmgXWrapper to see some examples of how this could be setup for a CFD code.
Keeping data on the GPU is essentially dependent upon your particular application. For the CFD applications I have worked with, the most important aim from a performance perspective is usually to have the outer cycle/timestep loop processed in its entirety on the GPU. You pass the constructed matrices, velocities, pressures, volumes, or whatever quantities will be processed, to the GPU before the cycle and only bring data back at the end (of course some data is still communicated i.e. halos, scalars for PCG etc.). You are able to pass device pointers via the API to AmgX so it is therefore possible to avoid ping-pong movement of data CPU <-> GPU.
w.r.t We have worked on several projects related to OpenFOAM acceleration using AmgX. One internal project does indeed extend PETSc with an AmgX backend but this isn't released yet (no plan currently in place either).
Instead the public work with OpenFOAM + AmgX leverages PETSc4FOAM, which was developed by the OpenFOAM HPC technical committee (in particular CINECA + ESI). I extended this functionality to also call into AmgX and do some additional data transformations, the benefit being performance for the pressure solve. Work is still ongoing and we are optimising for an increasing range of test problems.
Hello,
I am not sure about the underlying multi GPU concept of amgX.
The application I have in mind has the following features:
Are the amgX solvers build on the GPU?
Is there an example of a distributed memory/MPI implementation to possibly leverage amgX as an external, pure on GPU matrix solver library?
Klaus