NanoComp / meep

free finite-difference time-domain (FDTD) software for electromagnetic simulations
GNU General Public License v2.0
1.19k stars 610 forks source link

"Free" (or at least cheap) hardware acceleration via openMP #1719

Open smartalecH opened 3 years ago

smartalecH commented 3 years ago

The hybrid openMP/MPI branch (#1628) uses various openMP directives to parallelize the computation (e.g. #pragma omp parallel for).

More recent versions of openMP (circa 2018) support offloading the same computation onto hardware accelerators (e.g. GPUs), with very little modification to the same compiler directives. We would just have to make sure data that is meant to stay on the accelerator actually stays on the accelerator for a certain amount of time to overcome the hit from communication.

For example, we could create a function called run_until(n) that continuously timesteps for n steps without any interrupts (currently the run(until=n) calls back to python each iteration). All of the timestepping, dft-ing, etc. can be performed on the accelerator. Even convergence checks can be performed on the accelerator. The main benefit to using an accelerator for FDTD, of course, would be the extremely high memory bandwidths (FDTD is generally memory-bound, not compute bound).

In the past, pursuing hardware acceleration was rather undesirable as this required a custom kernel written using a proprietary API. While some directive-level shortcuts have existed for a long time (e.g. OpenACC) there wasn't enough motivation to justify the time sink. However, since we are already playing with OpenMP, it might be worth extending (or at least exploring) the functionality to also support basic accelerators.

smartalecH commented 3 years ago

An alternative approach is to use a framework like kokkos, which supports many different backends but is both data- and compute-explicit. This would potentially work much better than a single openMP library shipped with a particular compiler.