Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
111 stars 82 forks source link

Offshore boundary plane failing time tolerance #1342

Open rybchuk opened 2 days ago

rybchuk commented 2 days ago

Bug description

Likely related to this PR, and I am using the most recent commit on main, 740679c8. I'm running an offshore ABL simulation with linear waves. I have achieved a spun-up state at iteration 80000, and now I am trying to run an inflow-outflow simulation. I believe that I have successfully written out boundary planes using the following:

ABL.bndry_file                           = bndry_file.native
ABL.bndry_io_mode                        = 0
ABL.bndry_planes                         = ylo xlo
ABL.bndry_output_start_time              = 1.0
ABL.bndry_var_names                      = velocity_mueff velocity density p velocity_src_term gp vof_mueff levelset interface_curvature vof vof_src_term temperature_mueff temperature temperature_src_term ow_levelset ow_vof ow_velocity

(I know that I'm probably saving out too many boundary variables, but I saw the same error with = velocity temperature vof)

I then kick off the inflow-outflow simulation using the following variables:

ABL.bndry_file                           = ../write_lev0/bndry_file.native
ABL.bndry_io_mode                        = 1
ABL.bndry_planes                         = ylo xlo
ABL.bndry_output_start_time              = 1.0
ABL.bndry_var_names                      = velocity_mueff velocity density p velocity_src_term gp vof_mueff levelset interface_curvature vof vof_src_term temperature_mueff temperature temperature_src_term ow_levelset ow_vof ow_velocity # velocity temperature vof

The simulation errors out during the first timestep.

Steps to reproduce

I have shared all the files needed to run the bc-write simulation and bc-read simulation in /scratch/orybchuk/share/debug_inflow_outflow on Kestrel. I've uploaded many of the files (aside from the large ones) here too.

debug_inflow_outflow.zip

Steps to reproduce the behavior:

  1. Compiler used
    • Oneapi
  2. Operating system
    • Linux
  3. Hardware:
    • CPU
  4. Machine details (): Kestrel
    Currently Loaded Modules:
    1) intel/2023.2.0   7) cray-libsci/22.10.1.2
    2) craype/2.7.30        8) PrgEnv-intel/8.5.0
    3) cray-dsmml/0.2.2     9) cray-python/3.11.5
    4) libfabric/1.15.2.0  10) binutils/2.41
    5) craype-network-ofi  11) intel-oneapi-compilers/2023.2.0
    6) cray-mpich/8.1.28
  5. Input file attachments: See the .zip file.
  6. Error (paste or attach):
    
    ==============================================================================
    Step: 80011 dt: 0.05 Time: 6542.942863 to 6542.992863
    CFL: 0.402303 (conv: 0.402303 diff: 0 src: 0 )

Godunov: System Iters Initial residual Final residual

terminate called after throwing an instance of 'std::runtime_error' ... what(): Assertion `std::abs(time - m_in_data.tinterp()) < 1e-12' failed, file "/kfs2/projects/ai4wind/orybchuk/exawind-manager/environments/ai4wind-nov24/amr-wind/amr-wind/wind_energy/ABLBoundaryPlane.cpp", line 896

7. If this is a segfault, a stack trace from a debug build (paste or attach):
N/A

## Expected behavior
The bc-read simulation should run past the first timestamp without crashing.

## AMR-Wind information
<!-- Please provide as much detail as possible including git commit. The best information is a snapshot of the AMR-Wind header. -->

============================================================================== AMR-Wind (https://github.com/exawind/amr-wind)

AMR-Wind version :: v3.2.0-8-g740679c8-DIRTY AMR-Wind Git SHA :: 740679c87925db41521f477e8724daa0c69cc670-DIRTY AMReX version :: 24.09-45-g6d9c25b989f1

Exec. time :: Mon Nov 11 10:48:20 2024 Build time :: Nov 11 2024 10:17:52 C++ compiler :: IntelLLVM 2023.2.0

MPI :: ON (Num. ranks = 384) GPU :: OFF OpenMP :: OFF

Enabled third-party libraries: NetCDF 4.9.2

       This software is released under the BSD 3-clause license.

See https://github.com/Exawind/amr-wind/blob/development/LICENSE for details.


## Additional context
I tried initializing the bc-read simulation from `chk80000` and `chk80010`, and it fails for both checkpoints.

For the record, the timing of `chk80010` looks like `Step: 80011 dt: 0.05 Time: 6542.942863 to 6542.992863`, and the first few timestamps of the BC are:

80000 6542.4428627258721 80001 6542.5721400248704 80002 6542.6563112443018


so the BC definitely has data before the IC. The precursor simulation was run with an adaptive timestamp, but the bc-write and bc-read simulations are currently being run with a small fixed timestamp. I was also hitting the same error with a version of AMR-Wind that I compiled from `main` in October.
rybchuk commented 1 day ago

I did some more investigating, but have not been able to figure out a quick fix. I changed the time tolerance from 1e-12 to the aggressive 1e-4, and that didn't work.

  what():  Assertion `std::abs(time - m_in_data.tinterp()) < 1e-4' failed, file "/kfs2/projects/ai4wind/orybchuk/exawind-manager/environments/ai4wind-nov24/amr-wind/amr-wind/wind_energy/ABLBoundaryPlane.cpp", line 900

I also added some print statements into ABLBoundaryPlane.cpp to see the values of everything:

    amrex::Print() << "time: " << time << std::endl;
    amrex::Print() << "m_in_data.tn(): " << m_in_data.tn() << std::endl;
    amrex::Print() << "m_in_data.tnp1(): " << m_in_data.tnp1() << std::endl;
    amrex::Print() << "m_in_data.tinterp(): " << m_in_data.tinterp() << std::endl;
    amrex::Print() << "std::abs(time - m_in_data.tinterp()): " << std::abs(time - m_in_data.tinterp()) << std::endl;
    AMREX_ALWAYS_ASSERT(
        ((m_in_data.tn() <= time) || (time <= m_in_data.tnp1())));
    AMREX_ALWAYS_ASSERT(std::abs(time - m_in_data.tinterp()) < 1e-6);

Surprisingly, the values of time and m_in_data.tinterp() are printed out as identical and the difference truly appears to be 0:

time: 6542.942863
m_in_data.tn(): -1
m_in_data.tnp1(): -1
m_in_data.tinterp(): 6542.942863
std::abs(time - m_in_data.tinterp()): 0

I assume that m_in_data.tn() and m_in_data.tnp1() are supposed to be -1?


One more update: if I comment out AMREX_ALWAYS_ASSERT(std::abs(time - m_in_data.tinterp()) < 1e-6);, then my simulation does run through to timestepping. It's unclear yet if I'm seeing any unphysical artifacts.

mbkuhn commented 1 day ago

I'm confident I found the problem. The time passed to a vof fillpatch (within the advection step) is not valid, leading to a problem with this assertion. I'm not sure what the best solution is, but to confirm this is the problem, you could try running without vof as one of the bndry variables. This would also explain why this problem is showing up for offshore ABLs in particular.

rybchuk commented 1 day ago

Yep, I can confirm! If I run with my code that was compiled back in October and I set ABL.bndry_var_names = velocity temperature, the code makes it through the first few timesteps without any errors.

mbkuhn commented 1 day ago

Putting these here for myself and other developers: https://github.com/Exawind/amr-wind/blob/main/amr-wind/equation_systems/vof/SplitAdvection.cpp#L102 https://github.com/Exawind/amr-wind/blob/main/amr-wind/equation_systems/vof/vof_ops.H#L73

both of these would lead to breaking the assertion. The second one is easy to fix. The first one is a problem for two reasons:

  1. easier issue - need to pass in the time to SplitAdvection so it can have a valid value.
  2. harder issue - vof boundaries need to be at n, not n+1/2, defying the new way I set up the boundary fill times. Probably will need some type of specific exception to leave the boundaries at n while still doing fillpatch stuff elsewhere, kind of like what I did for the sibling fields fill.
mbkuhn commented 1 day ago

When you get a chance, could you also check with the more recent code, and just taking vof out of the input bndry variable names? No hurry

rybchuk commented 1 day ago

It's a quick thing for me to check :) I un-uncommented the AMREX_ALWAYS_ASSERT with yesterday's code, and I can confirm that that also runs through the first few timestamps without issue. And, the code fails if I add vof back into the BCs.