Alpine-DAV / ascent

A flyweight in situ visualization and analysis runtime for multi-physics HPC simulations
https://alpine-dav.github.io/ascent/
Other
187 stars 63 forks source link

build recipe requests for summit and frontier #1192

Open cyrush opened 10 months ago

cyrush commented 10 months ago

frontier public install for WarpX

Compatible with their new process.

https://warpx.readthedocs.io/en/latest/install/hpc/frontier.html

Also add info on how to build to WarpX Docs

NekRS requests (2023/08/04)

gnu + cuda builds Summit

Summit (mpicc/mpic++/mpif77)

module load gcc make cuda

 1) lsf-tools/2.0   3) xalt/1.2.1   5) git-lfs/2.11.0   7) cmake/3.23.2              9) nsight-systems/2021.3.1.54  11) spectrum-mpi/10.4.0.3-20210112
 2) hsi/5.0.2.p5    4) DefApps      6) gcc/9.1.0        8) nsight-compute/2021.2.1  10) cuda/11.0.3

gnu + hip builds on Frontier

Frontier (cc/CC/ftn) module load PrgEnv-gnu module load craype-accel-amd-gfx90a module load cray-mpich module load rocm module unload cray-libsci

 1) craype-x86-trento        5) xpmem/2.6.2-2.5_2.22__gd067c3f.shasta   9) craype/2.7.19          13) hsi/default              17) rocm/5.3.0
 2) libfabric/1.15.2.0       6) cray-pmi/6.1.8                         10) cray-dsmml/0.2.2       14) DefApps/default
 3) craype-network-ofi       7) gnuplot/5.4.3                          11) PrgEnv-gnu/8.3.3       15) craype-accel-amd-gfx90a
 4) perftools-base/22.12.0   8) gcc/12.2.0                             12) darshan-runtime/3.4.0  16) cray-mpich/8.1.23
nicolemarsaglia commented 9 months ago

Updates

nicolemarsaglia commented 9 months ago

NekRS on Frontier building Camp:

[ 22%] Building CXX object blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/blt_hip_smoke.cpp.o
cd /autofs/nccs-svm1_sw/summit/ums/ums010/2023_01/frontier/ascent_nekrs/build/camp-2022.10.1/blt/tests/smoke && /opt/cray/pe/craype/2.7.19/bin/CC -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -isystem /opt/rocm-5.3.0/include -isystem /opt/rocm-5.3.0/llvm/lib/clang/15.0.0/include/.. -Wall -Wextra      -O3 -DNDEBUG -fPIE --rocm-path=/opt/rocm-5.3.0 -x hip --offload-arch=gfx90a -std=c++17 -MD -MT blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/blt_hip_smoke.cpp.o -MF CMakeFiles/blt_hip_smoke.dir/blt_hip_smoke.cpp.o.d -o CMakeFiles/blt_hip_smoke.dir/blt_hip_smoke.cpp.o -c /autofs/nccs-svm1_sw/summit/ums/ums010/2023_01/frontier/ascent_nekrs/camp-2022.10.1/extern/blt/tests/smoke/blt_hip_smoke.cpp
g++: error: unrecognized command-line option '--rocm-path=/opt/rocm-5.3.0'
g++: error: unrecognized command-line option '--offload-arch=gfx90a'
make[2]: *** [blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/build.make:79: blt/tests/smoke/CMakeFiles/blt_hip_smoke.dir/blt_hip_smoke.cpp.o] Error 1
nicolemarsaglia commented 9 months ago

Bad news for gnu + hip on Frontier. Helpful info from our friend Ryan at OLCF: ..."the CC compiler wrapper for PrgEnv-gnu doesn't support HIP, because gcc (unlike clang) doesn't have support for HIP yet."

nicolemarsaglia commented 9 months ago

@mvictoras Unfortunately we haven't had the greatest success with these builds. We are road blocked on Frontier because the PrgEnv-gnu compiler wrappers do not support HIP. On Summit, I was able to get a build with a newer cuda version but the majority of my tests are failing with a cuda device error in vtkm.

yslan commented 8 months ago

@nicolemarsaglia I am able to run NekRS + Ascent on Frontier with the ascent module.

Here is the module I use

module load PrgEnv-gnu
module load craype-accel-amd-gfx90a
module load cray-mpich
module load rocm
module load ascent/0.8.0
module unload cray-libsci

module list

export MPICH_GPU_SUPPORT_ENABLED=1

Currently Loaded Modules:
  1) craype-x86-trento                      10) PrgEnv-gnu/8.3.3
  2) libfabric/1.15.2.0                     11) darshan-runtime/3.4.0
  3) craype-network-ofi                     12) hsi/default
  4) perftools-base/22.12.0                 13) DefApps/default
  5) xpmem/2.6.2-2.5_2.22__gd067c3f.shasta  14) craype-accel-amd-gfx90a
  6) cray-pmi/6.1.8                         15) cray-mpich/8.1.23
  7) gcc/12.2.0                             16) rocm/5.3.0
  8) craype/2.7.19                          17) ascent/0.8.0
  9) cray-dsmml/0.2.2

I'm also using my own branch of NekRS which is based on our latest release, v23. Let me know if you need any further information.

nicolemarsaglia commented 8 months ago

@yslan thanks for the info! I'm shocked there is an ascent module on Frontier. Unfortunately, ascent/0.8.0 will not have HIP/GPU support, but ascent/0.9.0 does, though that version is missing some key performance fixes.

yslan commented 8 months ago

ascent/0.8.0 will not have HIP/GPU support

Hmm.... I have been running NekRS + Ascent on Frontier up to 75 Frontier nodes, and it runs pretty well.

NekRS is running on GPU for sure and I found from our interface that we pass the GPU pointer to Ascent. I have hard time believing it can get the data if Ascent is running on the host.

Need @mvictoras for double checking what is actually happening.

On the other hand, do you happen to know which version of Ascent is in that module? I can find the path to the installed location but I can't find the source code.

/sw/frontier/spack-envs/base/opt/cray-sles15-zen3/gcc-12.2.0/ascent-0.8.0-6j27g2kx4a3zpg5ojh27ffhqsuurodzy/
cyrush commented 8 months ago

@yslan those are facility builds created with spack, so I think spack source stage is probably gone.

CUDA vs HIP runtimes are different with respect GPU vs host access pitfalls.

You could confirm by running a profiler to look at GPU work.

Note: We have only been using build_ascent for HIP builds. We want to have spack support for HIP, but it was changing so rapidly we had to have a stable way to build for Frontier.

yslan commented 8 months ago

It looks like the one I was using is rendered with OpenMP Offload. Screenshot-20231026022833-1633x770

cyrush commented 8 months ago

I see - I think it is using OpenMP on the CPU not GPU. GPU build should improve performance.

yslan commented 8 months ago

I think it is using OpenMP on the CPU not GPU.

Is there anyway to confirm this? On my end, I will try to setup timer and build our own benchmark.

GPU build should improve performance

For HIP, Camp's build system seems to only support LLVM right now and we need GNU.

We only sent a GPU pointer to Ascent. Does OpenMP manage to use that to automatically run on CPU?