NOAA-GFDL / MOM6

Modular Ocean Model
Other
27 stars 60 forks source link

OBC fails if not at the outer edge of the model domain #753

Open ThomasNeumann2 opened 4 weeks ago

ThomasNeumann2 commented 4 weeks ago

I am running a regional MOM6 setup with open boundaries. As long as the OBCs are located at the outer edge of the model (I,J = N or 0) domain, everthing runs fine. But moving the OBC inwards, the model fails after a couple of time steps. Error messages are different. Then I made a simple test. I set up a rectangle model with flat bottom and defined 4 OBCs at each model edge, W, N, E, and S. Everything works as expected: =============================================>>>>>> OBC_NUMBER_OF_SEGMENTS = 4 # OBC_SEGMENT_001 = "I=0,J=N:0,FLATHER,ORLANSKI" OBC_SEGMENT_001_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_002 = "I=N,J=0:N,FLATHER,ORLANSKI" OBC_SEGMENT_002_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_003 = "J=N,I=N:0,FLATHER,ORLANSKI" OBC_SEGMENT_003_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_004 = "J=0,I=0:N,FLATHER,ORLANSKI" OBC_SEGMENT_004_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" =============================================<<<<<<<

Moving e.g. the West OBC one grid point inwards: =============================================>>>>>>> OBC_NUMBER_OF_SEGMENTS = 4 # OBC_SEGMENT_001 = "I=1,J=N:0,FLATHER,ORLANSKI" OBC_SEGMENT_001_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_002 = "I=N,J=0:N,FLATHER,ORLANSKI" OBC_SEGMENT_002_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_003 = "J=N,I=N:1,FLATHER,ORLANSKI" OBC_SEGMENT_003_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # OBC_SEGMENT_004 = "J=0,I=1:N,FLATHER,ORLANSKI" OBC_SEGMENT_004_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" # MASK_OUTSIDE_OBCS = True =================================================<<<<<<

MOM6 fails with: FATAL from PE 0: NaN detected: u radiationOBCs: OBC%tres[xy]_001

The error message is independent on which OBC I move and independent on the number of grid points.

For the realistic setup, the error message was different and depends on the concrete setting.

Has anybody else detected this behavior or has running a regional model with OBCs inside the model domain?

kshedstrom commented 4 weeks ago

I tried moving the "west" boundary in my Bering domain two grids east and everything was fine. Moving the "east" boundary two grids west also works (the ice is in the east and doesn't even know about those two grids being masked out). This is a short test of three hours.

When does yours blow up? Can you put the MOM_input of the idealized case here or does it depend on other files?

ThomasNeumann2 commented 4 weeks ago

Thanks for trying out. The model blows up almost immediately, a few time steps. Now I am a bit puzzled. (1) I manipulated my coastal model by using a flat bathymetry in both MOM and SIS. (2) I started from scratch and made a new mosaic with flat bathy and less grid points. In (1) all grid points outside the OBC are masked out (wet in ocean_geometry), but not in SIS In (2) only one row/column outside the OBC is masked out and the other points are wet. Similar as you discovered. Up to now I did not found the difference which caused the different masking. However, both models fail with FATAL from PE 0: NaN detected: u radiationOBCs: OBC%tres[xy]_001 after a few time steps.

I append MOM*all and MOM_layout.

MOM_parameter_doc.layout.txt MOM_parameter_doc.all.txt

kshedstrom commented 4 weeks ago

You have to have land outside all OBCs, all the way to the edge. Mine was wrong because I was using a MASKING_DEPTH, but hadn't taught MASK_OUTSIDE_OBCS about it. I suggest you get rid of your MASKING_DEPTH line.

ThomasNeumann2 commented 4 weeks ago

Yes, this is the difference between my cses (1) and (2). No MASKING_DEPTH or MASKING_DEPTH = 0.0 makes all points outside the OBCs to land. But still it blows up . Maybe you can provide me with tar ball of your MOM source? The I can compare whether I have a different development stage in use.

kshedstrom commented 4 weeks ago

My sources are here: https://github.com/ESMG/MOM6/tree/fix_obc_maskingdepth

My Bering Sea ascii files are in: https://github.com/ESMG/Bering/tree/main/test_short

ThomasNeumann2 commented 4 weeks ago

Thanks for the code, but with your MOM6 the bug remains. I wonder which SIS2 version you are unsing. I am trying out step by step your settings. But some options are not in the code I am using, e.g.: DO_DYNAMICS, USE_TRIPOLAR_GEOLONB_BUG I use this repository: https://github.com/NOAA-GFDL/SIS2.git

ThomasNeumann2 commented 3 weeks ago

I don't think that SIS is the problem. I switched SIS and flux of in input.nml &coupler_nml months = 0, days = 0, hours = 1, current_date = 1990,1,1,0,0,0, calendar = 'gregorian', dt_cpld = 3600, dt_atmos = 3600, do_atmos = .false., do_land = .false., do_ice = .false., do_ocean = .true., do_flux = .false., atmos_npes = 0,
concurrent = .false.
use_lag_fluxes=.false.
check_stocks = 0 do_chksum = .false. /

The model blows up after 1 to 2 hours.

ThomasNeumann2 commented 3 weeks ago

The run time depends on time step setting, but only on DT_THERM. The longer DT_THERM the longer the model runs. For a combination DT=600, and DT_THERM=7200, it runs about 18 hours, for DT_THERM=3600 9 hours. That is, always 8 DT_THERM time steps. I tested several other DT_THERMs. The barotropic waves develop nicely until NaNs develop at the OBC and spread into the model domain.

If I got it right, you run your model for three hours, that is for 3 DT_THERM time steps?

ThomasNeumann2 commented 3 weeks ago

Last message for today. I made an ocean_only build and it is the same problem. OBCs at the model's edge run fine but inside the model domain it blows up. Thus, we can exclude SIS and some masking problems, I think.

kshedstrom commented 3 weeks ago

OK, yes, I didn't run it long enough. I can go 14 hours, then get:

WARNING from PE    23: Bad ice data End set_ice_surface_state ;  at  172.6  72.9 or i,j,k =   62  14   0; nbad =      2 on pe   23 ; T_sfc =  -1.9034E+09, ps =   9.9999E-01, S_sfc =   3.8475E+09

[n1:1957273:0:1957273] Caught signal 8 (Floating point exception: floating-point invalid operation)
kshedstrom commented 3 weeks ago

Does your ocean_only case require any netcdf files at all? Can you make a tarball of the inputs?

ThomasNeumann2 commented 3 weeks ago

No nc files are needed. Here are my INPUT , input.nml and _table files. Sorry, the files are not very clean due to the various testing. (I renamed the tars to .tar.gz to get git accepting it) INPUT.tar.gz tables.tar.gz

I discovered the the problem my start in the barotropic solver. I diagnosed each barotropic time step and it seems that "High Frequency Predictor Barotropic SSH" will have at some time an invalid value at the obc.

The first fig shows the Predictor Barotropic SSH in time step 109 with a strong negative value. It happens just once. Fig 2 shows the time development of the predictor ssh at lat=54 for the last time steps until the model blows up. One can also see that the NaNs (blanc values in the fig) propagate inwards one grid cell each time step. After the barotropic mode is completed the corrupted ssh and barotropic velocities enter the baroclinic mode and bring the model to a fatal error. The West OBC is at I=2, the others at I,J=0,N

Maybe this is a hint where to look at?

eta_pred_hifreq eta_pred_hifreq100-119

ThomasNeumann2 commented 3 weeks ago

Ups, I forgot the color bar. the negative value is -0.9. eta_pred_hifreq

kshedstrom commented 3 weeks ago

Yes, I can run your test and see what you are seeing. The T and S values get insane inside the land mask next to the OBC. Then the equation of state can't handle it. I will see if I can track it down.

ThomasNeumann2 commented 3 weeks ago

Could it be that in MOM_dynamics_split_RK2.F90 is a missmatch between bathyT and h(:,:,:) when eta is initialized at about code line 1546. bathyT(J=OBS) = 10m, while h(i,J=OBC,k)=0.001m. This yields an eta(J=OBC) = -9.99 m, which then enters the BT solver. This is not the case if the OBC is at J=N. (maybe it is somehow outside the domain?) It is in my teste case so with the OBC now in the North.

kshedstrom commented 3 weeks ago

Nothing needs to be valid outside of the OBC. Each OBC is on a grid cell edge and the cell next to it is out of bounds for any algorithm. However, what is going on for me is that the tracer values are getting updated on the outside. I will see about submitting a fix for it.

Tracer values never get updated outside of the computational tile so the edge OBCs are safe from this problem.

kshedstrom commented 3 weeks ago

OK, checkout this version: https://github.com/NOAA-GFDL/MOM6/pull/752/files

kshedstrom commented 3 weeks ago

It is possible that eta is also getting updated outside the OBCs in a similar manner. Those values get masked out with the special value on output, so I can't see it from outside the model. I don't know how much trouble that would generate, but it can be fixed in the same way - tomorrow.

ThomasNeumann2 commented 3 weeks ago

Unfortunately the fix does not work for me. I cloned dev/gfdl and copied your updated 2 files in. However, it seems to be the same bug. Have you tried out it in the flat bathymetry ocean-only setup? It is working for you? If so, I would be keen to try your entire code,

ThomasNeumann2 commented 3 weeks ago

Oh sorry, I have to apologize. It works fine for E and W boundaries, but unfortunately not for N and S.

kshedstrom commented 3 weeks ago

OK, sorry about that. Try again.

If the eta outside changing is a problem, I alsohave a eta_outside branch.

ThomasNeumann2 commented 3 weeks ago

Thanks, now it works fine. I also downloaded your eta_outside patch. Please tell me if I am too much a pain in the neck. However, there is another issue with MASK_OUTSIDE_OBCS. It seems that it works not correctly if the OBC do not span the entire model edge. It is ok if I,J=N,0 but not inside. I tried the following test cases:

=====================================>>>>

OBC_NUMBER_OF_SEGMENTS = 2 ! test case 1 !========================= !OBC_SEGMENT_001 = "I=N-2,J=70:N-2,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "J=N-2,I=N-2:70,FLATHER,ORLANSKI" ! !========================= ! test case 2 !OBC_SEGMENT_001 = "I=0,J=N:70,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "J=N,I=30:0,FLATHER,ORLANSKI" ! !========================= ! test case 3 !OBC_SEGMENT_001 = "I=2,J=N-2:70,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "J=N-2,I=30:2,FLATHER,ORLANSKI" ! !======================== ! test case 4 !OBC_SEGMENT_001 = "I=2,J=30:2,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "J=2,I=2:30,FLATHER,ORLANSKI" ! !======================== ! test case 5 !OBC_SEGMENT_001 = "I=0,J=30:0,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "J=0,I=0:30,FLATHER,ORLANSKI" ! !======================== ! test case 6 !OBC_SEGMENT_001 = "J=2,I=70:N-2,FLATHER,ORLANSKI" !OBC_SEGMENT_002 = "I=N-2,J=2:30,FLATHER,ORLANSKI" ! !======================== ! test case 7 OBC_SEGMENT_001 = "J=0,I=70:N,FLATHER,ORLANSKI" OBC_SEGMENT_002 = "I=N,J=0:30,FLATHER,ORLANSKI" ! !############### OBC_SEGMENT_001_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0" OBC_SEGMENT_002_DATA = "SSH=value:0.1,U=value:0.0,V=value:0,TEMP=value:7.0,SALT=value:33.0"

<<<===========================================

test cases 2 and 7 look ok. When the western boundary is involved and J>0 , the mask looks really strange. Sorry for approaching you again.

kshedstrom commented 3 weeks ago

I don't mind this at all, but I think your failing cases are not well formed. Having an open boundary out in the middle of the ocean doesn't work unless you build a staircase of OBCs with connecting corners from domain edge to domain edge or from land to land. The MASK_OUTSIDE routine does a flood fill of turning water to land until it hits more land or the edge. If it fills the entire domain, that is not following the rules. I will try something like this:

OBC_SEGMENT_001 = "J=70,I=N:70,FLATHER,ORLANSKI" OBC_SEGMENT_002 = "I=70,J=70:N,FLATHER,ORLANSKI"

This runs for me.

I thought of a better way to do the outside eta. If that runs, I'll add it to the main PR.

ThomasNeumann2 commented 3 weeks ago

You are right, I got it now. My regional model is running fine with all 6 OBCs. Thank you very much for your help and assistance.

kshedstrom commented 3 weeks ago

Good to hear! I will close this assuming PR #752 will somehow pass the automated testing and be accepted.