This PR adds WENO-Z (Borges, et al., 2008) and TENO (Fu et al., 2016) options on top of the existing WENO-JS and WENO-M schemes. WENO-Z is less dissipative and less costly than WENO-M, and is much less dissipative than WENO-JS while being similar in speed. TENO has even better spectral properties than WENO-Z, but can be less robust in extreme scenerios.
Please refer the the updates documentation for more details.
Type of change
[x] New feature (non-breaking change which adds functionality)
Scope
[x] This PR comprises a set of related changes with a common goal
How Has This Been Tested?
[x] 1D Shu-Osher Comparison at Nx=200
The 1D Shu-Osher test results are compared against those of Fu et al. (2016). Note that $C_T = 10^{-6}$ is used for TENO in our test as opposed to $C_T = 10^{-5}$ stated in the paper. The 'exact' solution is calculated using WENO-JS at Nx=2000, as suggested by Fu et al. (2016). The case files are included in the updated examples folder.
Fu et al. (2016)
[x] 1D Shu-Osher Comparison at Nx=1000
The same test but at higher resolution to show convergence.
[x] 2D Riemann Problem
The case files are the same as those in the example folder, with WENO-JS changed to WENO-M, WENO-Z, and TENO. The results show slight differences due to the chaotic nature of the simulation, and the better spectral properties are evident in the more prominent swirls at the same resolution (TENO > WENO-Z > WENO-M > WENO-JS).
WENO-JS (on MFC Website - Example Cases)
WENO-MWENO-ZTENO
[x] 2D Shock-Droplet Interaction
The case files are the same as those in the example folder, with WENO-M changed to WENO-Z and TENO respectively. The results are simliar to each other and to the original result using WENO-M on the MFC website within expectations.
WENO-M (on MFC Website - Example Cases)
WENO-ZTENO
Test Configuration:
What computers and compilers did you use to test this:
1D tests: Ubuntu 22.04.4 LTS; GNU v11.4.0
2D tests: Richardson
Checklist
[x] I have added comments for the new code
[x] I added Doxygen docstrings to the new code
[x] I have made corresponding changes to the documentation (docs/)
[x] I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
[x] I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
[x] I ran ./mfc.sh format before committing my code
[x] New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled (Note: not for AMD hardware)
[x] This PR does not introduce any repeated code (it follows the DRY principle)
[x] I cannot think of a way to condense this code and reduce any introduced additional line count
If your code changes any code source files (anything in src/simulation)
To make sure the code is performing as expected on GPU devices, I have:
[x] Checked that the code compiles using NVHPC compilers
[ ] Checked that the code compiles using CRAY compilers
[x] Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
[ ] Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
[x] Enclosed the new feature via nvtx ranges so that they can be identified in profiles
[x] Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR nsys_report_wenoz.txt
[ ] Ran an Omniperf profile using ./mfc.sh run XXXX --gpu -t simulation --omniperf, and have attached the output file and plain text results to this PR.
[x] Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature
16M points
Comments
New tests are generated for WENO-Z and TENO, for weno_order=3,5 and dim=1,2,3 where applicable. The golden files for WENO-M and MP_WENO are regenerated due to an improvement in how cases files are generated (the parameters are now optional), but all tests are performed before and after the regeneration to ensure that the results match. All tests before and after are passed on CPU (locally on Ubuntu and on Richardson), and on GPU (Delta).
The current implementation for WENO-M (if statement within WENO-JS) is improved using case-optimization, which is necessary for adding new WENO variants without affecting the performance. The attached performance report shows that the performance of the new code is virtually identical to the original code for WENO-JS and WENO-M, and marginally better at 8 GPUs. The performance of WENO-Z and TENO matches the expectations according to the theory and other papers (WENO-JS ~ WENO-Z > TENO >> WENO-M).
I only have access to Delta, which has Nvidia GPUs, but the changes to the GPU code are minimal and have very simple logic. Extensive tests have been performed on those GPUs to ensure the results are correct and the performance is as expected.
This is an example of a great PR! Good use of added tests, toolchain involvement, proof that it works as intended, and only the required minimal code additions. Will review it a bit more thoroughly tomorrow.
Description
This PR adds WENO-Z (Borges, et al., 2008) and TENO (Fu et al., 2016) options on top of the existing WENO-JS and WENO-M schemes. WENO-Z is less dissipative and less costly than WENO-M, and is much less dissipative than WENO-JS while being similar in speed. TENO has even better spectral properties than WENO-Z, but can be less robust in extreme scenerios.
Please refer the the updates documentation for more details.
Type of change
Scope
How Has This Been Tested?
WENO-JS (on MFC Website - Example Cases)
WENO-M
WENO-Z
TENO
![2D_riemann_test_teno](https://github.com/MFlowCode/MFC/assets/120074479/e61d1697-91e7-4082-9924-887abf50dd82)
WENO-M (on MFC Website - Example Cases)
WENO-Z
TENO
![2D_shockdroplet_TENO](https://github.com/MFlowCode/MFC/assets/120074479/15222c28-5e52-4366-8752-92077552b9c2)
Test Configuration:
Checklist
docs/
)examples/
that demonstrate my new feature performing as expected. They run to completion and demonstrate "interesting physics"./mfc.sh format
before committing my codeIf your code changes any code source files (anything in
src/simulation
)To make sure the code is performing as expected on GPU devices, I have:
nvtx
ranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys
, and have attached the output file (.nsys-rep
) and plain text results to this PR nsys_report_wenoz.txt./mfc.sh run XXXX --gpu -t simulation --omniperf
, and have attached the output file and plain text results to this PR.16M points![GPU_performance](https://github.com/MFlowCode/MFC/assets/120074479/b17be55b-9d09-4525-a616-1ccffafc5f88)
Comments