MFlowCode / MFC

Exascale simulation of multiphase/physics fluid dynamics
https://mflowcode.github.io
MIT License
142 stars 65 forks source link

OpenACC + Cray CCE + AMD MI200+ #368

Closed anandrdbz closed 6 months ago

anandrdbz commented 7 months ago

Description

Adds support for MI200+ GPUs via CCE compilers and OpenACC.

Type of change

Please delete options that are not relevant.

Scope

Closes #352 #383 #384

Test Configuration:

sbryngelson commented 6 months ago

Conflicts need to be resolved

sbryngelson commented 6 months ago

Aside from the test suite, the benchmarks are also failing on GPU:

Case                     Pre Process   Simulation   Post Process  
 ────────────────────────────────────────────────────────────────── 
  viscous_weno5_sgb_mono         1.00x          N/A            N/A  
  5eq_rk3_weno3_hllc             0.50x        0.98x          1.33x  
  ibm                            1.00x        1.04x          1.00x  
  hypo_hll                       1.00x          N/A            N/A
sbryngelson commented 6 months ago

Change ./mfc.sh load compute name from Crusher to Frontier

Update: Did this myself in https://github.com/MFlowCode/MFC/pull/368/commits/110a290dc1d744dfe7ca7c7387c6d16b43c07d37

sbryngelson commented 6 months ago

./mfc.sh test -a -- -c frontier does not work.

Specifically:

FileNotFoundError: [Errno 2] No such file or directory:
'/lustre/orion/cfd154/scratch/sbryngelson/MFC/build/install/dependencies/bin/h5d
ump'

and

sbryngelson/scratch $ ls MFC/build/install/dependencies/bin/
hipfc
sbryngelson commented 6 months ago

It's looking like Frontier CI may fail for the 2-rank case. Tests were run with

./mfc.sh test -j 8 -- -c frontier

The test MFC.sh file in the 2-rank directory reads

(set -x; srun -N 1 -n 2 "/lustre/orion/cfd154/scratch/sbryngelson/runner/actions-runner/_work/MFC/MFC/build/install/0571538fd2/bin/simulation")

which appears to be the problem, it should be passing --ntasks-per-node (or whatever) since we are using -- -c frontier

Update: It passed on second try 🤷

sbryngelson commented 6 months ago

@henryleberre, do you know why it doesn't build h5dump? (or at least it isn't found in the expected bin/ directory)

henryleberre commented 6 months ago

@sbryngelson We opted not to build HDF5 on CCE. I forget why, perhaps there were some incompatibilities. We use the cray-hdf5 module so h5dump should already be available.

sbryngelson commented 6 months ago

@henryleberre you are correct, h5dump is already in the path. It looks like the problem is that using test -a forces it to look in dependencies/bin/h5dump for the binary (rather than the path broadly). Is there a fix for this?

Here: ./mfc/test/test.py: h5dump = f"{HDF5.get_install_dirpath()}/bin/h5dump"

It does look like we have this option:

            if ARG("no_hdf5"):
                if not does_command_exist("h5dump"):
                    raise MFCException("--no-hdf5 was specified and h5dump couldn't be   found.")

                h5dump = shutil.which("h5dump")

though it doesn't seem to be working like this ./mfc.sh test -a j 1 -- -c frontier --no-hdf5

henryleberre commented 6 months ago

@sbryngelson I'm testing a fix. For your command, you would have to use this instead:

$ ./mfc.sh test -a --no-hdf5 -- -c frontier
sbryngelson commented 6 months ago

@henryleberre this works!

sbryngelson commented 6 months ago

closes #352 #383 #384