Closed lawrenceccheung closed 1 month ago
Because this fails at the first projection, and bad values of Min u and Max v appear before the projection, I wonder if the initialization is faulty. Do you have sampling or plt data at the 0 timestep? If those look fine, it will be pretty conclusive that the faulty data comes in via the source terms that are used prior to the MAC projection.
I can reproduce this behavior consistently. The plt
file at the 0 timestep is consistent between the working CPU simulation and the failing GPU run. So this is most likely from the source terms.
==============================================================================
Step: 1 dt: 0.5 Time: 0 to 0.5
CFL: 0.308108 (conv: 0.308108 diff: 0 src: 0 )
L-inf norm summary: before predictor step
..............................................................................
Level 0
velocity 4.065279426 12.32432027 0
gp 0 0 0
temperature 305
tke 0.1
..............................................................................
Godunov:
System Iters Initial residual Final residual
----------------------------------------------------------------------------
L-inf norm MAC vels: before MAC projection
..............................................................................
Max u: -100000000 | Location (x,y,z): 0, 0, 0
Min u: -inf | Location (x,y,z): 0, 0, 0
Max v: inf | Location (x,y,z): 0, 0, 0
Min v: 100000000 | Location (x,y,z): 0, 0, 0
Max w: 0.040875 | Location (x,y,z): 950, 950, 940
Min w: 0 | Location (x,y,z): 950, 950, 960
..............................................................................
In my test runs, I can reliably get the MMC forcing to work on GPUs with #1085 , @lawrenceccheung please re-open this still hasn't fixed it.
Bug description
Using the meso/microscale forcing (
ABLMesoForcingMom
source term) seems to fail on GPU's for me, but works fine on CPU's. Failure here is observed when the velocities blow up (infinite velocities). It could be an issue with the application of the source term, but also with theABL.initial_condition_input_file
where it uses an initial condition from a netcdf file.Steps to reproduce
The case set up is with input files is given here: https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/MMC_neutraldemo1_10x10_20m.ipynb. It looks to be independent of the mesh size/exact MMC forcing.
Steps to reproduce the behavior:
Compiler used
Operating system
Hardware:
Machine details ():
All set up files (input files and NC inputs) can be generated with this notebook, https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/MMC_neutraldemo1_10x10_20m.ipynb, and exact files are also on Frontier.
The failure happens on the first iteration with the MAC projection:
No segfault encountered here.
Expected behavior
Normal MMC coupled case which runs properly is shown here: https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/Postpro_neutraldemo1_10x10_20.ipynb
AMR-Wind information
This is the exe used on Frontier:
Additional context