Open HenningScheufler opened 2 months ago
I think the answer is yes. If a code is built for GPU, ParallelFor
etc. will launch a kernel on GPU. If the user needs to run on CPU, they can use functions like LoopOnCpu
etc. So a user's function can be run on either CPU or GPU depending which functions are used. We also have macors like AMREX_HOST_DEVICE_PARALLEL_FOR. Where the kernel is run depends on amrex::inGpuLaunchRegion is true or not. As for OpenMP, we often use a coarse-grained approach.
From a end-user perspective, switching the executor: CPU, OpenMP or GPU at runtime e.g. by changing a dictionary entry is an important features.
This can be achieved by an executor model that defines the execution space:
https://exasim-project.com/NeoFOAM/api/executor.html
In Amrex, the execution space appears to be defined at compile time and compiling amrex for CPU, OpenMP and GPU does not seems the attended uses case. However, this feature could be implemented by defining custom Arena:
https://amrex-codes.github.io/amrex/doxygen/classamrex_1_1Arena.html
But, this probably means that the current parallel implementation can not be used and would need to be implemented with kokkos (for this project: neofoam).
Do you think it is possible to use Amrex with and an executor model?