Frontier GPU-aware MPI - Githubissues

sbryngelson commented 2 months ago

Support and document GPU-aware MPI on Frontier.

If not oversubscribing the GPUs, make sure we run with -c 7 --gpus-per-task=1 --gpu-bind=closest. Because the NIC is connected directly to the GPU, affinity is very important.
- It seems like we aren't doing some of this per the current Frontier template: https://github.com/MFlowCode/MFC/blob/master/toolchain/templates/frontier.mako
Since we are already building everything in, it should just be a matter of setting the runtime flags.
- Runtime flag to set: MPICH_GPU_SUPPORT_ENABLED=1

sbryngelson commented 1 month ago

From some old 2021 docs:

• Environment variable, CRAY_ACC_USE_UNIFIED_MEM=1
• CCE offloading runtime library will auto-detect user-allocations of pinned or managed memory
• No explicit allocations or transfers will be issued for such memory
• Original pointers passed directly into GPU kernels
• CRAY_ACC_DEBUG runtime messages reflect this capability

https://www.olcf.ornl.gov/wp-content/uploads/2021/04/2021-05-20-Frontier-Tutorial-CCE.pdf

sbryngelson commented 1 month ago

From here: https://www.openmp.org/wp-content/uploads/2022-04-29-ECP-OMP-Telecon-HPE-Compiler.pdf

CCE OPENMP UNIFIED SHARED MEMORY SUPPORT FOR AMD MI250X

Dynamically enable GPU unified memory for OpenMP map clauses
• Set env vars CRAY_ACC_USE_UNIFIED_MEM=1 and HSA_XNACK=1
• Skips explicit allocate/transfer for all system memory
• Global ”declare target” variables will still be allocated separately (compiler statically emits a device copy)
• Statically enable GPU unified memory for OpenMP map clauses
• Compile with “requires unified_shared_memory” directive
• Set env var HSA_XNACK=1

MFlowCode / MFC

Frontier GPU-aware MPI #405