More and Longer Benchmarks

wilfonba commented 6 months ago

Description

This PR replaces the existing shock droplet benchmark with 5 new benchmarking cases that run for more time-steps and combined cover a large portion of the code's features.

Fixes #363 [optional]

Type of change

Please delete options that are not relevant.

[x] New feature (non-breaking change which adds functionality)

Scope

[x] This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

This PR should run the benchmarks to test the changes.

Checklist

[x] I have added comments for the new code
[x] I ran ./mfc.sh format before committing my code
[x] New and existing tests pass locally with my changes, including with GPU capability enabled and disabled
[x] This PR does not introduce any repeated code (it follows the DRY principle)
[x] I cannot think of a way to condense this code and reduce any introduced additional line count

sbryngelson commented 6 months ago

It seems that we should add an argument option to the case file that indicates if it is running from a CPU or GPU. If it's CPU, it should use fewer time steps than the GPU case (such that the benchmarks run for a somewhat similar amount of time).

sbryngelson commented 6 months ago

Please add a description of each case and what features it benchmarks at the top of the case file (or somewhere else easy to find). Right now if one benchmark result is much slower than another, it would not be clear what is causing the slowdown.

wilfonba commented 6 months ago

Information about what's enabled in each case is already given in comments in bench.yaml. I'll add a GPU vs CPU input though.

sbryngelson commented 6 months ago

can you rename the cases to reflect (roughly) what they're checking?

sbryngelson commented 6 months ago

Make names shorter: e.g. weno5viscous, weno3, ibm, etc. Whatever the 'main' part of the test is.
Remove the output.txt files
Remove the Test5/ directory

wilfonba commented 6 months ago

I'm unsure what Test5 directory you're referring to. I can't find it anywhere, and searching the repo doesn't show anything.

edit: Removed comment about shortening names. Found a decent way.

sbryngelson commented 6 months ago

is this ready to merge @wilfonba ?

wilfonba commented 6 months ago

I need to finish using 'DICT["gpu"]' in the case file. I was in a rush trying to finish it before going out and kept running into issues because I was rushing. I'll finish it and send a message either tonight or tomorrow.

wilfonba commented 6 months ago

I'm having issues where the git clone for hdf5 fails more than 3 times. I'll try again tomorrow and hope whatever hairball is causing this is resolved.

wilfonba commented 6 months ago

@sbryngelson This should be ready to merge as soon as all of the tests run. I haven't verified how long the GPU benchmarking takes, but the CI will indicate if I need to make any changes.

sbryngelson commented 6 months ago

I think the benchmarking CI doesn't work until merged, correct? Though I guess it will run benchmark on the PR, so maybe it does...

wilfonba commented 6 months ago

Yeah. It won't actually compare anything, but I think it will still run the cases. I can also just run it manually on phoenix.

henryleberre commented 6 months ago

@wilfonba GitHub was probably rate-limiting your clone requests.

sbryngelson commented 6 months ago

@wilfonba GitHub was probably rate-limiting your clone requests.

I'm not sure -- Checking the GPU logs, all the jobs (and their actual case outputs) completed successfully. Only the toolchain failed (seemingly). It likely failed because the master benchmarks are different cases than the PR ones. We should throw a proper exception here (if that is indeed the case).

Relevant code snippet https://github.com/wilfonba/MFC-Wilfong/blob/ef9cf27924c486cec299d24b97c2b16e5dc0f819/toolchain/mfc/bench.py#L47-L69

Edit: No, problem is in the snippet probably (though it appears to already check? maybe something going wrong there?) https://github.com/wilfonba/MFC-Wilfong/blob/ef9cf27924c486cec299d24b97c2b16e5dc0f819/toolchain/mfc/bench.py#L79-L85

Other than this, I'm glad to merge the PR. The cases are quite good and take ~50 minutes according to the SLURM log in the artifact. If there is a simple fix for this @henryleberre I would like to include it in the PR but if not then we can merge.

henryleberre commented 6 months ago

@sbryngelson It did fail for that reason, but not in the manner you expected. The code already took care of situations where the cases ran in the two compared benchmarks differ, by only comparing cases whose names are common to both benchmark result files (taking the set intersection of case names). I tested this to make sure it worked.

The error here arose from an oversight - on my end. We also check for when the metadata field is different between compared benchmarks. It includes both the CLI invocation and the current modes MFC was in (e.g. --gpu, --mpi, --debug, etc.). When they differ, we produce a warning. However, the code I wrote also immediately aborted. I fixed this and gave the warning messages some more polish.

The new output looks like (ran locally):

~/dev/MFC $ ./mfc.sh bench_diff ~/Downloads/logs-gpu/master/bench-gpu.yaml ~/Downloads/logs-gpu/pr/bench-gpu.yaml 
mfc: OK > (venv) Entered the Python 3.12.2 virtual environment (>= 3.8).

      .=++*:          -+*+=.        | henryleberre@Henrys-Laptop.local [Darwin]
     :+   -*-        ==   =* .      | -----------------------------------------
   :*+      ==      ++    .+-       | --jobs 1
  :*##-.....:*+   .#%+++=--+=:::.   | --mpi
  -=-++-======#=--**+++==+*++=::-:. | --no-gpu
 .:++=----------====+*= ==..:%..... | --no-debug
  .:-=++++===--==+=-+=   +.  :=     | --targets pre_process, simulation, and post_process
  +#=::::::::=%=. -+:    =+   *:    | ----------------------------------------------------------
 .*=-=*=..    :=+*+:      -...--    | $ ./mfc.sh (build, run, test, clean, count, packer) --help

 Comparing Bencharks: ../../Downloads/logs-gpu/master/bench-gpu.yaml is x times slower than ../../Downloads/logs-gpu/pr/bench-gpu.
 Warning: Metadata in lhs and rhs are not equal.
  This could mean that the benchmarks are not comparable (e.g. one was run on CPUs and the other on GPUs).
  lhs:
  * Invocation: bench -j 24 -o bench-gpu.yaml -- -c phoenix --gpu -g 0 1 -n 2
  * Modes:      debug=False gpu=False mpi=True
  rhs:
  * Invocation: bench --mem 1 -j 24 -o bench-gpu.yaml -- -c phoenix --gpu -g 0 1 -n 2
  * Modes:      debug=False gpu=False mpi=True
 Warning: Cases in lhs and rhs are not equal.
  * rhs cases: ibm, viscous_weno5_sgb_mono, 5eq_rk3_weno3_hllc, hypo_hll.
  * lhs cases: 3D_shockdroplet.
  Using intersection: set() with 0 elements.

  Case   Pre Process   Simulation   Post Process  
 ──────────────────────────────────────────────── 

mfc: (venv) Exiting the Python virtual environment.

MFlowCode / MFC