Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
103 stars 78 forks source link

GPU Unit test Failues: DiagnosticsTest.Max_Vel_MultiLevel and Max_MACvel_MultiLevel #1063

Closed moprak-nrel closed 1 month ago

moprak-nrel commented 1 month ago

Bug description

On a CUDA GPU, there are two unit test failures, with the following error:

[ RUN      ] DiagnosticsTest.Max_Vel_MultiLevel

L-inf norm vels: cell-centered
..............................................................................
Max u:         0.9891493056 |  Location (x,y,z): 0.1041666667, 4.895833333,      1.875
Min u:              -100000 |  Location (x,y,z):         -5,         -5,         -2
Max v:               100000 |  Location (x,y,z):         -5,         -5,         -2
Min v:            -0.984375 |  Location (x,y,z): 4.895833333, 4.895833333,      0.125
Max w:          4.972897784 |  Location (x,y,z): 4.895833333, 0.1041666667,      1.875
Min w:              -100000 |  Location (x,y,z):         -5,         -5,         -2
..............................................................................
[ RUN      ] DiagnosticsTest.Max_MACvel_MultiLevel

L-inf norm MAC vels: face-centered
..............................................................................
Max u:                    1 |  Location (x,y,z): 2.220446049e-16, 4.895833333,      1.875
Min u:              -100000 |  Location (x,y,z):         -5,         -5,         -2
Max v:               100000 |  Location (x,y,z):         -5,         -5,         -2
Min v:            -0.984375 |  Location (x,y,z): 4.895833333,          5,      0.125
Max w:          2.988503539 |  Location (x,y,z): 4.895833333, 3.229166667,          2
Min w:              -100000 |  Location (x,y,z):         -5,         -5,         -2
..............................................................................

Both of these could be from the values set here: https://github.com/Exawind/amr-wind/blob/d7e7b8124ad04454a0f158c24f4f3b617cab6f2e/unit_tests/utilities/test_diagnostics.cpp#L30-L35 and I'm wondering if this is related to #1050

Steps to reproduce

Steps to reproduce the behavior:

  1. Compiler used
    • [x] GCC
    • [ ] LLVM
    • [ ] oneapi (Intel)
    • [ ] nvcc (NVIDIA)
    • [ ] rocm (AMD)
    • [ ] with MPI
    • [ ] other:
  2. Operating system
    • [x] Linux
    • [ ] OSX
    • [ ] Windows
    • [ ] other (do tell ;)):
  3. Hardware:
    • [ ] CPU
    • [x] GPU
  4. Machine details ():
    
    <!-- name, modules loaded, environment variables, etc -->
    ellis with the following modules

MODULES_DATE=2024-05-01 source /data/ssd1/software/${MODULES_DATE}/env.sh

module load gcc module load binutils module load git module load cmake module load cuda/12.2.2



## AMR-Wind information
`main` branch with [d7e7b81](https://github.com/Exawind/amr-wind/commit/d7e7b8124ad04454a0f158c24f4f3b617cab6f2e)
mbkuhn commented 1 month ago

I'll look into it, because I wrote the unit test in question