MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
132 stars 56 forks source link

Fix Frontier performance regression #413

Closed wilfonba closed 1 month ago

wilfonba commented 1 month ago

Description

Fixes a performance regression on Frontier between the optimized code at the head of @abbotts fork of MFC and the current master branch.

Type of change

Scope

How Has This Been Tested?

Ran the test suite on Frontier

Checklist

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

sbryngelson commented 1 month ago

The benchmarking "passes" even though this is the result:

Run . ./mfc.sh load -c p -m g
mfc: Loading modules (& env variables) for GT Phoenix on GPUs:
mfc:  $ module load python/3.9.12-rkxvr6
mfc:  $ module load cmake/3.23.1-327dbl
mfc:  $ module load cuda/11.7.0-7sdye3
mfc:  $ module load nvhpc/22.11
mfc:  $ export MFC_CUDA_CC=70,80
mfc:  $ export CC=nvc
mfc:  $ export CXX=nvc++
mfc:  $ export FC=nvfortran
mfc: OK > All modules and environment variables have been loaded.
mfc: OK > (venv) Entered the Python 3.9.12 virtual environment (>= 3.8).

      .=++*:          -+*+=.        | sbryngelson3@login-phoenix-slurm-4.pace.gatech.edu [Linux]
     :+   -*-        ==   =* .      | ----------------------------------------------------------
   :*+      ==      ++    .+-       | --jobs 1
  :*##-.....:*+   .#%+++=--+=:::.   | --mpi
  -=-++-======#=--**+++==+*++=::-:. | --no-gpu
 .:++=----------====+*= ==..:%..... | --no-debug
  .:-=++++===--==+=-+=   +.  :=     | --targets pre_process, simulation, and post_process
  +#=::::::::=%=. -+:    =+   *:    | ----------------------------------------------------------
 .*=-=*=..    :=+*+:      -...--    | $ ./mfc.sh (build, run, test, clean, count, packer) --help

 Comparing Benchmarks:
        1.[5](https://github.com/MFlowCode/MFC/actions/runs/9081893351/job/24956905746#step:5:6)x indicates pr/bench-cpu.yaml is 1.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is faster than master/bench-cpu.yaml).
        0.5x indicates pr/bench-cpu.yaml is 0.5-times as fast as master/bench-cpu.yaml (so pr/bench-cpu.yaml is slower than master/bench-cpu.yaml).

  Case                     Pre Process   Simulation   Post Process  
 ────────────────────────────────────────────────────────────────── 
  viscous_weno5_sgb_mono         1.00x          N/A            N/A  
  5eq_rk3_weno3_hllc             1.00x          N/A            N/A  
  ibm                            2.00x          N/A            N/A  
  hypo_hll                       0.50x          N/A            N/A

I opened an issue for this https://github.com/MFlowCode/MFC/issues/415