Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
114 stars 83 forks source link

netCDF boundary plane write regression test fails after modifications #1380

Open rybchuk opened 3 days ago

rybchuk commented 3 days ago

Bug description

I have a simulation with a refinement zone that touches the boundary, and I am write to write out the BCs for this simulation to netCDF, but it's failing. Marc pointed out that there's a regression test that does the same thing, but it succeeds. If I make some reasonable modifications, the simulation fails with the following message:

Shear Stress model: moeng
Heat Flux model: moeng
Creating output NetCDF file: bndry_file.nc
NetCDF file created successfully: bndry_file.nc

Writing NetCDF file bndry_file.nc at time 0
terminate called after throwing an instance of 'std::runtime_error'
  what():  Encountered NetCDF error; aborting
srun: error: x1006c0s4b1n1: task 177: Aborted (core dumped)

Here is the diff on abl.i:

9c9
< time.fixed_dt         =   0.5        # Use this constant dt if > 0
---
> time.fixed_dt         =   0.01 # 0.5        # Use this constant dt if > 0
52c52
< ABL.bndry_output_start_time = 2.0
---
> ABL.bndry_output_start_time = 0.0 # 2.0
58c58
< amr.n_cell              = 48 48 48    # Grid cells at coarsest AMRlevel
---
> amr.n_cell              = 320 320 240 # 48 48 48    # Grid cells at coarsest AMRlevel

I've also changed out static_box.txt:

1,7c1,3
< 2
< 2
< -100 -10 -100 1100 10 300
< -10 -100 -100 10 1100 300
< 2
< -100 -10 -100 1100 10 300
< -10 -100 -100 10 1100 300
---
> 1
> 1
> 0.0 0.0 100.0 1000.0 1000.0 150.0

Side note: It looks weird to me that the original static_box.txt has refinement zones that start at x=-100 even though xlo=0 in these simulations.

Other notes

Steps to reproduce

Steps to reproduce the behavior:

  1. Compiler used

    • [x] oneapi (Intel)
  2. Operating system

    • [x] Linux
  3. Hardware:

    • [x] CPU
  4. Machine details (): Kestrel

    Currently Loaded Modules:
    1) intel/2023.2.0   7) cray-libsci/22.10.1.2
    2) craype/2.7.30        8) PrgEnv-intel/8.5.0
    3) cray-dsmml/0.2.2     9) cray-python/3.11.5
    4) libfabric/1.15.2.0  10) binutils/2.41
    5) craype-network-ofi  11) intel-oneapi-compilers/2023.2.0
    6) cray-mpich/8.1.28
  5. Input file attachments failed_reg_test.zip

  6. Error (paste or attach): See fail1.log in the ZIP

  7. If this is a segfault, a stack trace from a debug build (paste or attach): This looks like a netCDF error?

Expected behavior

The simulation should not crash and should write out a boundary condition file in the netCDF format

AMR-Wind information

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v3.2.0-21-g3e9d3a8b-DIRTY
  AMR-Wind Git SHA :: 3e9d3a8b4704e98ab6691cefeea2d240d24bef3f-DIRTY
  AMReX version    :: 24.09-45-g6d9c25b989f1

  Exec. time       :: Sat Nov 30 08:28:59 2024
  Build time       :: Nov 20 2024 07:39:48
  C++ compiler     :: IntelLLVM 2023.2.0

  MPI              :: ON    (Num. ranks = 192)
  GPU              :: OFF
  OpenMP           :: OFF

  Enabled third-party libraries:
    NetCDF    4.9.2

           This software is released under the BSD 3-clause license.
 See https://github.com/Exawind/amr-wind/blob/development/LICENSE for details.
------------------------------------------------------------------------------
marchdf commented 1 day ago

Hi @rybchuk thanks (as always) for the detailed bug report. I would think this isn't due to my recent changes to the boundary planes (that was all native format stuff) but who knows... I will look into it.

rybchuk commented 1 day ago

I appreciate it, thanks man!

marchdf commented 1 day ago

Ok I think I figured it out (?).

This is what your grid looks like at 320 320 240. Notice that dark band? That's level 1 and it is not starting at the bottom of the boundary because your refinement looks like:

1
1
0.0 0.0 100.0 1000.0 1000.0 150.0

Screenshot 2024-12-02 at 1 45 05 PM

If I switch your static box to

1
1
0.0 0.0 0.0 1000.0 1000.0 150.0

then it has no problem writing the netcdf file. Here's the grid:

Screenshot 2024-12-02 at 1 49 03 PM

So why does it work with a lower resolution? It is because you get lucky and it is so coarse that it ends up putting cells all the way down to the bottom of the boundary (due to magic of n_error_buf, grid sizing, etc).

Screenshot 2024-12-02 at 1 52 03 PM

Basically, netcdf does not support having refinement zones that don't start at the bottom. This is due to a indexing thing. It uses the amrex indices for the netcdf buffers writes. When it has a zone that is in the middle of the domain, it starts counting up there and then it hits the "exceeds bounds" error you were seeing (or would have seen on a debug build). We would need to compute an offset to allow for this. I will add documentation that the netcdf path does not allow this. But the native path does.

Did you intend to refine not starting at the bottom? Maybe you did because of the ocean? How bad would it be for your to use the native format? With the new viz thing you might be able to use standard tools to load the data. This is what the boundary plane looks like with the refinement in the middle:

Screenshot 2024-12-02 at 2 00 02 PM

rybchuk commented 16 hours ago

Ahhh okay, thanks for the thorough analysis!

Yeah this scenario popped up because of some offshore ABL work that I'm doing, so I intentionally offset the refinement zone from z=0. My teammate plans to generate multi-level BC data with a machine learning algorithm, and it's my responsibility to get that data into a format that AMR-Wind can read. At this point, it's fairly straightforward for me to take single-level numpy data and reformat it into a netCDF file that AMR-Wind can read. The easiest approach would have been to extend that to multi-level data.

How difficult would it be to add a feature that computes that offset? The other alternative would be for me to put together code that translates from numpy data --> AMReX native BC data, which seems possible but heinous.

marchdf commented 14 hours ago

Ok cool, thanks for the context. Yeah getting that numpy data into the native format might be a bit painful. Maybe pyamrex would make it easy?

Given that I haven't looked at that bit of code in forever... I can't answer your question. I will dig around ;)

rybchuk commented 12 hours ago

Thanks yeah, let me know. If modifying the netCDF code is too painful, this native writer could be a timely project for me while Kestrel is down next week.