Closed sbryngelson closed 1 month ago
From some old 2021 docs:
• Environment variable, CRAY_ACC_USE_UNIFIED_MEM=1
• CCE offloading runtime library will auto-detect user-allocations of pinned or managed memory
• No explicit allocations or transfers will be issued for such memory
• Original pointers passed directly into GPU kernels
• CRAY_ACC_DEBUG runtime messages reflect this capability
https://www.olcf.ornl.gov/wp-content/uploads/2021/04/2021-05-20-Frontier-Tutorial-CCE.pdf
From here: https://www.openmp.org/wp-content/uploads/2022-04-29-ECP-OMP-Telecon-HPE-Compiler.pdf
CCE OPENMP UNIFIED SHARED MEMORY SUPPORT FOR AMD MI250X
Dynamically enable GPU unified memory for OpenMP map clauses
• Set env vars CRAY_ACC_USE_UNIFIED_MEM=1 and HSA_XNACK=1
• Skips explicit allocate/transfer for all system memory
• Global ”declare target” variables will still be allocated separately (compiler statically emits a device copy)
• Statically enable GPU unified memory for OpenMP map clauses
• Compile with “requires unified_shared_memory” directive
• Set env var HSA_XNACK=1
Support and document GPU-aware MPI on Frontier.
If not oversubscribing the GPUs, make sure we run with
-c 7 --gpus-per-task=1 --gpu-bind=closest
. Because the NIC is connected directly to the GPU, affinity is very important.Since we are already building everything in, it should just be a matter of setting the runtime flags.
MPICH_GPU_SUPPORT_ENABLED=1