When compiling for CPUs, amrex::ParallelFor only uses a single OMP thread. Some ParallelFor loops in hipace are outside MFIter loops, so to use multiple threads we need to define our own PrallelFor that uses OMP when compiling for CPU.
When compiling for GPU the normal amrex::ParallelFor is used.
The new omp ParallelFor is used in both the beam and plasma pushers. The plasma pusher was omp parallelized previously. The new version has the same performance but is cleaner (hide whitespace). The beam pusher was not omp parallelized before and could be comparatively very slow when using many threads. Now it's fast.
[ ] Small enough (< few 100s of lines), otherwise it should probably be split into smaller PRs
[ ] Tested (describe the tests in the PR description)
[ ] Runs on GPU (basic: the code compiles and run well with the new module)
[ ] Contains an automated test (checksum and/or comparison with theory)
[ ] Documented: all elements (classes and their members, functions, namespaces, etc.) are documented
When compiling for CPUs, amrex::ParallelFor only uses a single OMP thread. Some ParallelFor loops in hipace are outside MFIter loops, so to use multiple threads we need to define our own PrallelFor that uses OMP when compiling for CPU.
When compiling for GPU the normal amrex::ParallelFor is used.
The new omp ParallelFor is used in both the beam and plasma pushers. The plasma pusher was omp parallelized previously. The new version has the same performance but is cleaner (hide whitespace). The beam pusher was not omp parallelized before and could be comparatively very slow when using many threads. Now it's fast.
const
isconst
)