Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
103 stars 78 forks source link

MMC forcing fails with GPU #1057

Closed lawrenceccheung closed 1 month ago

lawrenceccheung commented 1 month ago

Bug description

Using the meso/microscale forcing (ABLMesoForcingMom source term) seems to fail on GPU's for me, but works fine on CPU's. Failure here is observed when the velocities blow up (infinite velocities). It could be an issue with the application of the source term, but also with the ABL.initial_condition_input_file where it uses an initial condition from a netcdf file.

Steps to reproduce

The case set up is with input files is given here: https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/MMC_neutraldemo1_10x10_20m.ipynb. It looks to be independent of the mesh size/exact MMC forcing.

Steps to reproduce the behavior:

  1. Compiler used

    • [x] GCC
    • [ ] LLVM
    • [ ] oneapi (Intel)
    • [ ] nvcc (NVIDIA)
    • [x] rocm (AMD)
    • [ ] with MPI
    • [x] other: Clang
  2. Operating system

    • [x] Linux
    • [ ] OSX
    • [ ] Windows
    • [ ] other (do tell ;)):
  3. Hardware:

    • [ ] CPU
    • [x] GPU
  4. Machine details ():

I tested it out on the following systems: System result
Frontier GPU failed
Frontier CPU works
Summit GPU failed
Sandia CPU works
  1. Input file attachments

All set up files (input files and NC inputs) can be generated with this notebook, https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/MMC_neutraldemo1_10x10_20m.ipynb, and exact files are also on Frontier.

  1. Error (paste or attach):

The failure happens on the first iteration with the MAC projection:

==============================================================================
Step: 1 dt: 0.5 Time: 0 to 0.5
CFL: 0.308108 (conv: 0.308108 diff: 0 src: 0 )

L-inf norm summary: before predictor step
..............................................................................
Level 0
  velocity                 4.065279426         12.32432027                   0
  gp                                 0                   0                   0
  temperature                      305
  tke                              0.1
..............................................................................

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------

L-inf norm MAC vels: before MAC projection
..............................................................................
Max u:         -2.185020045 |  Location (x,y,z):       1340,       1030,         10
Min u:                 -inf |  Location (x,y,z):          0,          0,          0
Max v:                  inf |  Location (x,y,z):          0,          0,          0
Min v:         -12.32431897 |  Location (x,y,z):      10230,       3820,        510
Max w:             0.040875 |  Location (x,y,z):      10230,      10230,        940
Min w:       -0.03247020874 |  Location (x,y,z):       4170,       3930,         20
..............................................................................

  MAC_projection                 0                     0                     0
  1. If this is a segfault, a stack trace from a debug build (paste or attach):

No segfault encountered here.

Expected behavior

Normal MMC coupled case which runs properly is shown here: https://github.com/FLOWMAS-EERC/microscale_surrogate_wakes/blob/main/MMCdemo/Postpro_neutraldemo1_10x10_20.ipynb

AMR-Wind information

This is the exe used on Frontier:

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: d4dd236
  AMR-Wind Git SHA :: d4dd236b4c00d20ac024003433ce0036a179914a
  AMReX version    :: 24.01-19-g022f97ea9ebb

  Exec. time       :: Sun May 12 22:01:54 2024
  Build time       :: Feb 29 2024 21:30:39
  C++ compiler     :: Clang 15.0.0

  MPI              :: ON    (Num. ranks = 256)
  GPU              :: ON    (Backend: HIP)
  OpenMP           :: OFF

  Enabled third-party libraries: 
    NetCDF    4.7.4
    HYPRE     2.30.0
    OpenFAST  

           This software is released under the BSD 3-clause license.           
 See https://github.com/Exawind/amr-wind/blob/development/LICENSE for details. 
------------------------------------------------------------------------------

Additional context

mbkuhn commented 1 month ago

Because this fails at the first projection, and bad values of Min u and Max v appear before the projection, I wonder if the initialization is faulty. Do you have sampling or plt data at the 0 timestep? If those look fine, it will be pretty conclusive that the faulty data comes in via the source terms that are used prior to the MAC projection.

moprak-nrel commented 1 month ago

I can reproduce this behavior consistently. The plt file at the 0 timestep is consistent between the working CPU simulation and the failing GPU run. So this is most likely from the source terms.

==============================================================================
Step: 1 dt: 0.5 Time: 0 to 0.5
CFL: 0.308108 (conv: 0.308108 diff: 0 src: 0 )

L-inf norm summary: before predictor step
..............................................................................
Level 0
  velocity                 4.065279426         12.32432027                   0
  gp                                 0                   0                   0
  temperature                      305
  tke                              0.1
..............................................................................

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------

L-inf norm MAC vels: before MAC projection
..............................................................................
Max u:           -100000000 |  Location (x,y,z):          0,          0,          0
Min u:                 -inf |  Location (x,y,z):          0,          0,          0
Max v:                  inf |  Location (x,y,z):          0,          0,          0
Min v:            100000000 |  Location (x,y,z):          0,          0,          0
Max w:             0.040875 |  Location (x,y,z):        950,        950,        940
Min w:                    0 |  Location (x,y,z):        950,        950,        960
..............................................................................
moprak-nrel commented 1 month ago

In my test runs, I can reliably get the MMC forcing to work on GPUs with #1085 , @lawrenceccheung please re-open this still hasn't fixed it.