angus-g / python-ale

Interface to MOM6 regridding/remapping in Python
MIT License
0 stars 0 forks source link

Add diagnostics of individual terms #7

Closed AndyHoggANU closed 1 year ago

AndyHoggANU commented 1 year ago

We seem to have persistent along-layer gradients in the density adaptivity regions, which implies that one of the non-adaptive terms is doing work here. To diagnose what is happening here, we need to be able to output the size of each term as a diagnostic.

angus-g commented 1 year ago

I'm fairly sure we're linking to a version of MOM6 where I have indeed got those diagnostic terms calculated. Exposing them to the Python interface is a matter of allocating the diag_CS structure and associating it into the adapt_CS.

angus-g commented 1 year ago

Looks like I actually had already implemented this!

Using

The test_diags.py test shows an example of this, but you pass a list of the diagnostics you want to accelerate_ale(). As part of the tuple it returns, you get a dictionary mapping the diagnostic names to their array. I think this will correspond to the last iteration, in the accelerated case...

# single iteration
h_new, diags = pyale.do_regrid(cs, regrid_cs, dt=1.0, diags=["adapt_slope_u"])

# accelerated
h_new, t_new, s_new, diags = pyale.accelerate_ale(cs, regrid_cs, 1, dt=1.0, diags=["adapt_slope_u"])

# resuming accelerated
h_new, t_new, s_new, diags = pyale.resume_ale(cs, regrid_cs, state, 1, dt=1.0)

Diagnostics

There are diagnostics available for the key terms, and it's pretty simple to add anything else we need here. I've tried to compile all the current diagnostics and a short description of what they do and how they're generated.

adapt_denom_u, adapt_denom_v

This is the denominator of the density adaptivity term, combining the along-layer slopes with the local vertical gradient.

i_denom = hdi_sig_u + hdj_sig_u + dk_sig_u
j_denom = hdj_sig_v + hdi_sig_v + dk_sig_v

denom_u(I,j,K) = sqrt(i_denom)
denom_v(i,J,K) = sqrt(j_denom)

adapt_slope_u, adapt_slope_v

This is the slope part of the density adaptivity term, used to determine the flux of interface displacement.

dz_s_i(I,j) = hdi_sig(I,j,K) / sign(sqrt(i_denom), dk_sig_u)
dz_s_j(i,J) = hdj_sig(i,J,K) / sign(sqrt(j_denom), dk_sig_v)

adapt_coord_u, adapt_coord_v

This is the along-coordinate slope.

hdi_sig_u = hdi_sig(I,j,K)**2
hdj_sig_u = 0.25 * ((hdj_sig(i,J,K)**2 + hdj_sig(i+1,J-1,K)**2) + &
            (hdj_sig(i+1,J,K)**2 + hdj_sig(i,J-1,K)**2))
dk_sig_u = 0.5 * (dk_sig_int(i,j)**2 + dk_sig_int(i+1,j)**2)

i_denom = hdi_sig_u + hdj_sig_u + dk_sig_u

coord_u(I,j,K) = (hdi_sig_u + hdj_sig_u) / i_denom

hdj_sig_v = hdj_sig(i,J,K)**2
hdi_sig_v = 0.25 * ((hdi_sig(I,j,K)**2 + hdi_sig(I-1,j+1,K)**2) + &
            (hdi_sig(I,j+1,K)**2 + hdi_sig(I-1,j,K)**2))
dk_sig_v = 0.5 * (dk_sig_int(i,j)**2 + dk_sig_int(i,j+1)**2)

j_denom = hdj_sig_v + hdi_sig_v + dk_sig_v

coord_v(i,J,K) = (hdi_sig_v + hdj_sig_v) / j_denom

adapt_phys_u, adapt_phys_v

This is the interface slope (in physical space).

hdi_sig_u = hdi_sig_phys(I,j,K)**2
hdj_sig_u = 0.25 * ((hdj_sig_phys(i,J,K)**2 + hdj_sig_phys(i+1,J-1,K)**2) + &
            (hdj_sig_phys(i+1,J,K)**2 + hdj_sig_phys(i,J-1,K)**2))
i_denom = hdi_sig_u + hdj_sig_u + dk_sig_u

phys_u(I,j,K) = (hdi_sig_u + hdj_sig_u) / i_denom

hdj_sig_v = hdj_sig_phys(i,J,K)**2
hdi_sig_v = 0.25 * ((hdi_sig_phys(I,j,K)**2 + hdi_sig_phys(I-1,j+1,K)**2) + &
            (hdi_sig_phys(I,j+1,K)**2 + hdi_sig_phys(I-1,j,K)**2))
j_denom = hdi_sig_v + hdj_sig_v + dk_sig_v

phys_v(i,J,K) = (hdi_sig_v + hdj_sig_v) / j_denom

adapt_limiting_density

This is the difference between the unlimited and limited slope flux, across the participating adjacent cells. dz_s_i and dz_s_j are the density adaptivity flux terms, and dz_p_unlim is the unlimited component of that.

limiting_density(i,j,K) = limiting_density(i,j,K) + (dz_s_i(I,j) - dz_p_unlim_i) + (dz_s_j(i,J) - dz_p_unlim_j)
limiting_density(i+1,j,K) = limiting_density(i+1,j,K) + (dz_s_i(I,j) - dz_p_unlim_i) + (dz_s_j(i,J) - dz_p_unlim_j)

adapt_limiting_smoothing

Similar to above, but on the smoothing term.

adapt_disp_density, adapt_disp_smoothing

The total interface displacement due to the density and smoothing terms, respectively.

disp_density(i,j,K) = 0.25 * G%IareaT(i,j) / L_to_H &
  * ((G%dyCu(I,j) * dz_s_i(I,j) - G%dyCu(I-1,j) * dz_s_i(I-1,j)) &
  +  (G%dxCu(i,J) * dz_s_j(i,J) - G%dxCu(i,J-1) * dz_s_j(i,J-1)))

disp_smoothing(i,j,K) = 0.25 * G%IareaT(i,j) / L_to_H &
  * ((G%dyCu(I,j) * dz_p_i(I,j) - G%dyCu(I-1,j) * dz_p_i(I-1,j)) &
  +  (G%dxCu(i,J) * dz_p_j(i,J) - G%dxCu(i,J-1) * dz_p_j(i,J-1)))

adapt_disp_unlimited

Interface displacement due to the unlimited smoothing term.

AndyHoggANU commented 1 year ago

Great, I'll give it a burl!

AndyHoggANU commented 1 year ago

Reporting back on my (lack of) progress using these diagnostics. Two interesting problems.

  1. When I put out the diagnostics, they all have a strange shape, so can't be immediately plotted:
    >> diags['adapt_denom_v'].shape
    (167, 808, 76)

    Does this mean it is stitching the tiles together incorrectly here? (Note that this happens irrespective of whether OpenMP is on or off.)

  2. Even if I could plot them, all the diagnostics I have tried yield arrays of zeros (irrespective of dt, iters ....)
AndyHoggANU commented 1 year ago

(I'm also getting frequent kernel deaths, trying to establish a pattern as to why ... it's not related to diags or OpenMP, but still trying to figure it out ... Will start a new issue on this.)

angus-g commented 1 year ago

When I put out the diagnostics, they all have a strange shape

What shape do you expect them to have? I don't remember the size of the domain, but that one would be (xh, yq, zi), i.e. an extra point in the Y and Z dimensions. Although they're backwards compared to the normal Fortran convention which could be an issue...

Does this mean it is stitching the tiles together incorrectly here?

There aren't any tiles here, everything is in terms of the full array. OpenMP does shared-memory parallelism, so each process operates on a section of the same underlying memory.

Even if I could plot them, all the diagnostics I have tried yield arrays of zeros

That does indeed sound like an issue!

AndyHoggANU commented 1 year ago

What shape do you expect them to have?

The domain is (160, 800, 75) ... so at least it gets the vertical direction right, but the others are 7 grid points out (which is a strange number to be off by).

angus-g commented 1 year ago

I think that could be explained by the halos, which probably default to 4 in each direction?

angus-g commented 1 year ago

I just did a quick test, and for one iteration on the tub restart, at least a couple of the diagnostics have non-zero values:

adapt_slope_u zero/nonzero: 1726319 8577457
adapt_slope_v zero/nonzero: 1647333 8607803
AndyHoggANU commented 1 year ago

Did you change anything? Let me check my script.

AndyHoggANU commented 1 year ago

Hmm, definitely still getting zeros here:

np.count_nonzero(diags['adapt_slope_u'])
0

Can you try running the first 15 cells of this one? https://github.com/AndyHoggANU/anu-tub/blob/master/diagnostics/AG_VertRegrid.ipynb

angus-g commented 1 year ago

Ah! I wasn't doing a run of ALE before asking for diagnostics. Picked up another edge case... In fortran, if you declare and assign a variable at the same time, that assignment is only executed the first time the function is run! So we had

logical :: do_diag = .true.

But on the first ALE call, there weren't any diagnostics, so they were being disabled, and then never re-enabled. I've updated the build on Gadi, hopefully it'll behave this time!

AndyHoggANU commented 1 year ago

Yep, that's fixed it -- thanks. And I assume I should just delete the first 4 and last 3 rows to get the position of the diagnostic variable about right?

angus-g commented 1 year ago

I think it might actually be first 3 and last 4. It's the difference between the data and computational indices from this table:

domain start end
xh, data 1 167
xq, data 0 167
yh, data 1 807
yq, data 0 807
xh, comp 4 163
xq, comp 3 163
yh, comp 4 803
yq, comp 3 803
angus-g commented 1 year ago

This is fixed by the MOM6 change brought in by 24fe6d8.