Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
112 stars 83 forks source link

Offshore boundary plane failing time tolerance #1342

Open rybchuk opened 2 weeks ago

rybchuk commented 2 weeks ago

Bug description

Likely related to this PR, and I am using the most recent commit on main, 740679c8. I'm running an offshore ABL simulation with linear waves. I have achieved a spun-up state at iteration 80000, and now I am trying to run an inflow-outflow simulation. I believe that I have successfully written out boundary planes using the following:

ABL.bndry_file                           = bndry_file.native
ABL.bndry_io_mode                        = 0
ABL.bndry_planes                         = ylo xlo
ABL.bndry_output_start_time              = 1.0
ABL.bndry_var_names                      = velocity_mueff velocity density p velocity_src_term gp vof_mueff levelset interface_curvature vof vof_src_term temperature_mueff temperature temperature_src_term ow_levelset ow_vof ow_velocity

(I know that I'm probably saving out too many boundary variables, but I saw the same error with = velocity temperature vof)

I then kick off the inflow-outflow simulation using the following variables:

ABL.bndry_file                           = ../write_lev0/bndry_file.native
ABL.bndry_io_mode                        = 1
ABL.bndry_planes                         = ylo xlo
ABL.bndry_output_start_time              = 1.0
ABL.bndry_var_names                      = velocity_mueff velocity density p velocity_src_term gp vof_mueff levelset interface_curvature vof vof_src_term temperature_mueff temperature temperature_src_term ow_levelset ow_vof ow_velocity # velocity temperature vof

The simulation errors out during the first timestep.

Steps to reproduce

I have shared all the files needed to run the bc-write simulation and bc-read simulation in /scratch/orybchuk/share/debug_inflow_outflow on Kestrel. I've uploaded many of the files (aside from the large ones) here too.

debug_inflow_outflow.zip

Steps to reproduce the behavior:

  1. Compiler used
    • Oneapi
  2. Operating system
    • Linux
  3. Hardware:
    • CPU
  4. Machine details (): Kestrel
    Currently Loaded Modules:
    1) intel/2023.2.0   7) cray-libsci/22.10.1.2
    2) craype/2.7.30        8) PrgEnv-intel/8.5.0
    3) cray-dsmml/0.2.2     9) cray-python/3.11.5
    4) libfabric/1.15.2.0  10) binutils/2.41
    5) craype-network-ofi  11) intel-oneapi-compilers/2023.2.0
    6) cray-mpich/8.1.28
  5. Input file attachments: See the .zip file.
  6. Error (paste or attach):
    
    ==============================================================================
    Step: 80011 dt: 0.05 Time: 6542.942863 to 6542.992863
    CFL: 0.402303 (conv: 0.402303 diff: 0 src: 0 )

Godunov: System Iters Initial residual Final residual

terminate called after throwing an instance of 'std::runtime_error' ... what(): Assertion `std::abs(time - m_in_data.tinterp()) < 1e-12' failed, file "/kfs2/projects/ai4wind/orybchuk/exawind-manager/environments/ai4wind-nov24/amr-wind/amr-wind/wind_energy/ABLBoundaryPlane.cpp", line 896

7. If this is a segfault, a stack trace from a debug build (paste or attach):
N/A

## Expected behavior
The bc-read simulation should run past the first timestamp without crashing.

## AMR-Wind information
<!-- Please provide as much detail as possible including git commit. The best information is a snapshot of the AMR-Wind header. -->

============================================================================== AMR-Wind (https://github.com/exawind/amr-wind)

AMR-Wind version :: v3.2.0-8-g740679c8-DIRTY AMR-Wind Git SHA :: 740679c87925db41521f477e8724daa0c69cc670-DIRTY AMReX version :: 24.09-45-g6d9c25b989f1

Exec. time :: Mon Nov 11 10:48:20 2024 Build time :: Nov 11 2024 10:17:52 C++ compiler :: IntelLLVM 2023.2.0

MPI :: ON (Num. ranks = 384) GPU :: OFF OpenMP :: OFF

Enabled third-party libraries: NetCDF 4.9.2

       This software is released under the BSD 3-clause license.

See https://github.com/Exawind/amr-wind/blob/development/LICENSE for details.


## Additional context
I tried initializing the bc-read simulation from `chk80000` and `chk80010`, and it fails for both checkpoints.

For the record, the timing of `chk80010` looks like `Step: 80011 dt: 0.05 Time: 6542.942863 to 6542.992863`, and the first few timestamps of the BC are:

80000 6542.4428627258721 80001 6542.5721400248704 80002 6542.6563112443018


so the BC definitely has data before the IC. The precursor simulation was run with an adaptive timestamp, but the bc-write and bc-read simulations are currently being run with a small fixed timestamp. I was also hitting the same error with a version of AMR-Wind that I compiled from `main` in October.
rybchuk commented 1 week ago

I did some more investigating, but have not been able to figure out a quick fix. I changed the time tolerance from 1e-12 to the aggressive 1e-4, and that didn't work.

  what():  Assertion `std::abs(time - m_in_data.tinterp()) < 1e-4' failed, file "/kfs2/projects/ai4wind/orybchuk/exawind-manager/environments/ai4wind-nov24/amr-wind/amr-wind/wind_energy/ABLBoundaryPlane.cpp", line 900

I also added some print statements into ABLBoundaryPlane.cpp to see the values of everything:

    amrex::Print() << "time: " << time << std::endl;
    amrex::Print() << "m_in_data.tn(): " << m_in_data.tn() << std::endl;
    amrex::Print() << "m_in_data.tnp1(): " << m_in_data.tnp1() << std::endl;
    amrex::Print() << "m_in_data.tinterp(): " << m_in_data.tinterp() << std::endl;
    amrex::Print() << "std::abs(time - m_in_data.tinterp()): " << std::abs(time - m_in_data.tinterp()) << std::endl;
    AMREX_ALWAYS_ASSERT(
        ((m_in_data.tn() <= time) || (time <= m_in_data.tnp1())));
    AMREX_ALWAYS_ASSERT(std::abs(time - m_in_data.tinterp()) < 1e-6);

Surprisingly, the values of time and m_in_data.tinterp() are printed out as identical and the difference truly appears to be 0:

time: 6542.942863
m_in_data.tn(): -1
m_in_data.tnp1(): -1
m_in_data.tinterp(): 6542.942863
std::abs(time - m_in_data.tinterp()): 0

I assume that m_in_data.tn() and m_in_data.tnp1() are supposed to be -1?


One more update: if I comment out AMREX_ALWAYS_ASSERT(std::abs(time - m_in_data.tinterp()) < 1e-6);, then my simulation does run through to timestepping. It's unclear yet if I'm seeing any unphysical artifacts.

mbkuhn commented 1 week ago

I'm confident I found the problem. The time passed to a vof fillpatch (within the advection step) is not valid, leading to a problem with this assertion. I'm not sure what the best solution is, but to confirm this is the problem, you could try running without vof as one of the bndry variables. This would also explain why this problem is showing up for offshore ABLs in particular.

rybchuk commented 1 week ago

Yep, I can confirm! If I run with my code that was compiled back in October and I set ABL.bndry_var_names = velocity temperature, the code makes it through the first few timesteps without any errors.

mbkuhn commented 1 week ago

Putting these here for myself and other developers: https://github.com/Exawind/amr-wind/blob/main/amr-wind/equation_systems/vof/SplitAdvection.cpp#L102 https://github.com/Exawind/amr-wind/blob/main/amr-wind/equation_systems/vof/vof_ops.H#L73

both of these would lead to breaking the assertion. The second one is easy to fix. The first one is a problem for two reasons:

  1. easier issue - need to pass in the time to SplitAdvection so it can have a valid value.
  2. harder issue - vof boundaries need to be at n, not n+1/2, defying the new way I set up the boundary fill times. Probably will need some type of specific exception to leave the boundaries at n while still doing fillpatch stuff elsewhere, kind of like what I did for the sibling fields fill.
mbkuhn commented 1 week ago

When you get a chance, could you also check with the more recent code, and just taking vof out of the input bndry variable names? No hurry

rybchuk commented 1 week ago

It's a quick thing for me to check :) I un-uncommented the AMREX_ALWAYS_ASSERT with yesterday's code, and I can confirm that that also runs through the first few timestamps without issue. And, the code fails if I add vof back into the BCs.

hgopalan commented 1 week ago

Adding to our previous discussion, I randomly get a code crash at :

AMREX_ALWAYS_ASSERT(
    ((m_in_data.tn() <= time) || (time <= m_in_data.tnp1())));

It happens on both CPUs and GPUs. I would say 10% of my cases see this issue.

mbkuhn commented 1 week ago

@hgopalan could you make a separate issue with a testable example? That's a different problem than what Alex is seeing.

rybchuk commented 5 days ago

So... when I submitted the issue, I unfortunately set up my inflow-outflow simulation incorrectly. While I set ABL.bndry_io_mode = 1, I was also running geometry.is_periodic = 1 1 0. As a result, it's ambiguous if I was accidentally running my inflow-outflow simulation as a periodic simulation.

I'm now trying to run the inflow-outflow simulation with:

ABL.bndry_io_mode                        = 1
ABL.bndry_planes                         = xlo
...
geometry.is_periodic                     = 0   1   0

xlo.type                                 = mass_inflow
xlo.density                              = 1.027
xlo.temperature                          = 265.0
xlo.tke                                  = 0.0
xhi.type                                 = pressure_outflow

The simulation now errors out with

AMReX (24.09-45-g6d9c25b989f1) initialized
Initializing AMR-Wind ...
ParmParse::sgetarr ParmParse::sgetarr(): xlo.vof not found in table

This error is being reported by amrex/Src/Base/AMReX_ParmParse.cpp. Do you have any suggestions on what to do about xlo.vof? I don't see any examples of this line in the repo. Also, what should I do about xlo.density when MultiPhase expects two densities? And should xhi.type still be pressure_outflow?

rybchuk commented 5 days ago

Harish suggested setting xlo.vof = 0 (where 0 is a dummy value), and my simulation is now running again :) I hope to have results soon with a longer inflow-outflow simulation

mbkuhn commented 5 days ago

Thanks, @hgopalan, that's a great suggestion for this scenario

rybchuk commented 5 days ago

I ran the simulation out for 200 iterations (10 seconds). Based off the vof field by the end of the simulation, the numerical beach is indeed damping the waves. Here's pressure at that last step too.

Unfortunately, u, v, w, and T look funny at the outflow. This feels like I botched something with the inflow-outflow .i file though. I haven't run inflow-outflow simulations in a bit, so let me confirm that I remember how to properly run them on land.

mbkuhn commented 5 days ago

are you using ABL Mean Boussinesq in there too?

rybchuk commented 6 hours ago

Nope, good catch.

I've added ABL Mean Boussinesq in there. I've also run the calc_inflow_stats.py script but commented out all the ABL lines because these values feel like they're specific to land-based ABLs with Monin-Obukhov.

ICNS.source_terms                        = BoussinesqBuoyancy CoriolisForcing GeostrophicForcing GravityForcing ABLMeanBoussinesq
##--------- Additions by calc_inflow_stats.py ---------#
# ABL.wall_shear_stress_type = "local"
# ABL.inflow_outflow_mode = true
# ABL.wf_velocity = -0.04061319424974897 0.01560499433783477
# ABL.wf_vmag = 0.05284818547325939
# ABL.wf_theta = 265.0000040814566
# BodyForce.magnitude = 0.0 0.0 0.0
BoussinesqBuoyancy.read_temperature_profile = true
BoussinesqBuoyancy.tprofile_filename = avg_theta.dat
##-----------------------------------------------------#

After making that change, my inflow-outflow simulation ran for the full requested two minutes without any obvious issues (u, v, w, T)!

Whereas the above simulations didn't have any refinement zones, I also ran ~20 seconds of an inflow-outflow simulation with two levels of refinement. The u, v, w, and T fields also look good there!

I've attached the read and write .i files below for reference for posterity. offshore_io.zip


I'm going to do some more futzing around now, where I take BCs from a spun-up simulation and I try to drive a simulation with quiescent ICs, but that's not relevant to this issue. The VoF fillpatch workaround that I've been using looks like it works great for the time being, so there's no pressing need from me to thinking about the VoF fillpatch code more rigorously. Thanks for the help!

marchdf commented 2 hours ago

That's awesome news @rybchuk . Good to close the issue then?